[FEAT] Implement k-fold validation #102

Open
opened 2025-07-27 22:08:37 +00:00 by nuluh · 0 comments
nuluh commented 2025-07-27 22:08:37 +00:00 (Migrated from github.com)

Problem Statement

Currently, model evaluation may be limited due to the use of a single train/test split, which can lead to overfitting or unreliable performance estimates. There is a need for a more robust evaluation method to assess model generalization.

Proposed Solution

Integrate k-fold cross-validation into the model evaluation pipeline, allowing for model performance to be averaged over multiple splits. This would provide a more reliable estimate of model accuracy and reduce variance due to data partitioning. Additionally, implement a visualization plot (e.g., boxplot, line plot, or bar chart) to show model performance metrics (such as accuracy or loss) across each fold, highlighting the distribution and variance during the cross-validation process.

Alternatives Considered

Alternatives include leave-one-out cross-validation or a simple train/test split, but k-fold offers a good balance between computational efficiency and robustness.

Component

Jupyter Notebook

Priority

Critical (blocks progress)

Implementation Ideas

  • Use scikit-learn's KFold or StratifiedKFold utilities
  • Refactor current evaluation code to loop over k folds
  • Aggregate results and report mean/variance of model metrics
  • Make the number of folds configurable by the user
  • Use matplotlib or seaborn to visualize the performance (e.g., accuracy, F1 score) for each fold, such as with boxplots or line plots for interpretability.

Expected Benefits

This will provide a more reliable and generalizable estimate of model performance, making the thesis results stronger and more credible. Visualization will help to quickly interpret how stable the model is across folds and identify any outlier behavior.

Additional Context

K-fold validation is a standard ML practice and would be beneficial for comparing different models or feature sets. Visualizing the results will add clarity to the thesis and strengthen the analysis.

### Problem Statement Currently, model evaluation may be limited due to the use of a single train/test split, which can lead to overfitting or unreliable performance estimates. There is a need for a more robust evaluation method to assess model generalization. ### Proposed Solution Integrate k-fold cross-validation into the model evaluation pipeline, allowing for model performance to be averaged over multiple splits. This would provide a more reliable estimate of model accuracy and reduce variance due to data partitioning. Additionally, implement a visualization plot (e.g., boxplot, line plot, or bar chart) to show model performance metrics (such as accuracy or loss) across each fold, highlighting the distribution and variance during the cross-validation process. ### Alternatives Considered Alternatives include leave-one-out cross-validation or a simple train/test split, but k-fold offers a good balance between computational efficiency and robustness. ### Component Jupyter Notebook ### Priority Critical (blocks progress) ### Implementation Ideas - Use scikit-learn's KFold or StratifiedKFold utilities - Refactor current evaluation code to loop over k folds - Aggregate results and report mean/variance of model metrics - Make the number of folds configurable by the user - Use matplotlib or seaborn to visualize the performance (e.g., accuracy, F1 score) for each fold, such as with boxplots or line plots for interpretability. ### Expected Benefits This will provide a more reliable and generalizable estimate of model performance, making the thesis results stronger and more credible. Visualization will help to quickly interpret how stable the model is across folds and identify any outlier behavior. ### Additional Context K-fold validation is a standard ML practice and would be beneficial for comparing different models or feature sets. Visualizing the results will add clarity to the thesis and strengthen the analysis.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nuluh/thesis#102