[FEAT] Implement k-fold validation #102
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem Statement
Currently, model evaluation may be limited due to the use of a single train/test split, which can lead to overfitting or unreliable performance estimates. There is a need for a more robust evaluation method to assess model generalization.
Proposed Solution
Integrate k-fold cross-validation into the model evaluation pipeline, allowing for model performance to be averaged over multiple splits. This would provide a more reliable estimate of model accuracy and reduce variance due to data partitioning. Additionally, implement a visualization plot (e.g., boxplot, line plot, or bar chart) to show model performance metrics (such as accuracy or loss) across each fold, highlighting the distribution and variance during the cross-validation process.
Alternatives Considered
Alternatives include leave-one-out cross-validation or a simple train/test split, but k-fold offers a good balance between computational efficiency and robustness.
Component
Jupyter Notebook
Priority
Critical (blocks progress)
Implementation Ideas
Expected Benefits
This will provide a more reliable and generalizable estimate of model performance, making the thesis results stronger and more credible. Visualization will help to quickly interpret how stable the model is across folds and identify any outlier behavior.
Additional Context
K-fold validation is a standard ML practice and would be beneficial for comparing different models or feature sets. Visualizing the results will add clarity to the thesis and strengthen the analysis.