[EXP] Cross-dataset validation #74
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hypothesis
Training on one dataset (A) and validating on another (B) - and vice versa - will provide more robust evaluation of model generalization performance than standard train-test splits within each dataset.
Background & Motivation
My thesis proposal defense revealed an important validation gap - my professor requested evaluation of how well models trained on one dataset perform on another. This cross-dataset validation approach will test real-world generalization capabilities and reveal if the models are learning dataset-specific patterns rather than generalizable features.
This approach addresses potential data leakage concerns and provides stronger evidence for the robustness of the proposed methods across different data collection environments/scenarios.
Dataset
Methodology
Implement two cross-validation scenarios:
For each scenario:
Create visualization comparing performance across all validation approaches:
Analyze discrepancies in performance between validation methods
Parameters & Hyperparameters
Use identical hyperparameters as in previous experiments for fair comparison
For each model type (e.g., Random Forest, SVM, Neural Network):
Key modification is only the training/validation data split approach
Evaluation Metrics
Notebook Location
notebooks/cross_dataset_validation.ipynb
Dependencies
References
No response
Additional Notes
This experiment is critical for the thesis defense as it addresses a specific request from the committee. It will demonstrate robustness of my approach across datasets collected in different environments.
The implementation can leverage the existing model training pipeline with minimal modifications to the data loading and evaluation procedures. The main code changes will be to the dataset splitting logic rather than model architecture or training.
Expected outcome: Some performance drop in cross-dataset validation is anticipated, but a drop greater than 15-20% would indicate overfitting to dataset-specific patterns and may require revisiting feature engineering.