Exp/74 exp cross dataset validation #107

Merged
nuluh merged 3 commits from exp/74-exp-cross-dataset-validation into dev 2025-08-28 05:09:16 +00:00
nuluh commented 2025-08-28 05:07:58 +00:00 (Migrated from github.com)

This pull request makes several improvements and refactorings to the code/notebooks/stft.ipynb notebook, focusing on code clarity, reproducibility, and enhanced data visualization and evaluation for STFT and machine learning workflows. The main changes include standardizing sensor naming, improving type annotations, refactoring plotting and evaluation code, and adding new sections for AU data testing.

Key changes:

Data structure and naming consistency

  • Standardized sensor directory naming from sensor1/sensor2 to sensorA/sensorB in output_dirs for clarity and consistency.
  • Updated type annotations for ready_data1a and ready_data2a to explicitly specify lists of pd.DataFrame for better code readability and type safety. [1] [2]
  • Updated variable names for labels (y, y1, y2) to avoid overwriting and clarify which dataset each label corresponds to.

Refactoring and code quality

  • Refactored multiprocessing code to only pass the range of damage cases, simplifying the pool.map call and likely moving fixed arguments inside the worker function.
  • Used the proper pandas method to rename columns in X1b and X2b for clarity and maintainability.

Visualization and plotting improvements

  • Introduced a reusable preview_stft function for visualizing STFT data with configurable axis ticks and improved labeling, replacing inline plotting code.
  • Enhanced confusion matrix visualization by plotting both Sensor A and Sensor B matrices side by side, customizing colorbars, and improving figure export quality for publication.
  • Added code to save STFT plots as images for documentation and reproducibility.

Machine learning evaluation enhancements

  • Refactored evaluation code to compute and export classification reports as nicely formatted LaTeX tables, improving reporting and documentation. [1] [2]
  • Added timing code to measure and report SVM prediction inference time over multiple runs, supporting reproducibility and performance analysis.

New functionality

  • Added a new section for testing with AU data, including code to load and preprocess the data and compute STFTs for selected columns, extending the notebook's applicability.

These changes collectively improve the notebook's usability, maintainability, and reproducibility for STFT analysis and machine learning evaluation.

This pull request makes several improvements and refactorings to the `code/notebooks/stft.ipynb` notebook, focusing on code clarity, reproducibility, and enhanced data visualization and evaluation for STFT and machine learning workflows. The main changes include standardizing sensor naming, improving type annotations, refactoring plotting and evaluation code, and adding new sections for AU data testing. **Key changes:** ### Data structure and naming consistency * Standardized sensor directory naming from `sensor1`/`sensor2` to `sensorA`/`sensorB` in `output_dirs` for clarity and consistency. * Updated type annotations for `ready_data1a` and `ready_data2a` to explicitly specify lists of `pd.DataFrame` for better code readability and type safety. [[1]](diffhunk://#diff-8d79bb7b4277999923c858dfe6ed3ff4ff4e85c1781c3b8075d102dd5bf8c860L345-R345) [[2]](diffhunk://#diff-8d79bb7b4277999923c858dfe6ed3ff4ff4e85c1781c3b8075d102dd5bf8c860L357-R357) * Updated variable names for labels (`y`, `y1`, `y2`) to avoid overwriting and clarify which dataset each label corresponds to. ### Refactoring and code quality * Refactored multiprocessing code to only pass the range of damage cases, simplifying the `pool.map` call and likely moving fixed arguments inside the worker function. * Used the proper pandas method to rename columns in `X1b` and `X2b` for clarity and maintainability. ### Visualization and plotting improvements * Introduced a reusable `preview_stft` function for visualizing STFT data with configurable axis ticks and improved labeling, replacing inline plotting code. * Enhanced confusion matrix visualization by plotting both Sensor A and Sensor B matrices side by side, customizing colorbars, and improving figure export quality for publication. * Added code to save STFT plots as images for documentation and reproducibility. ### Machine learning evaluation enhancements * Refactored evaluation code to compute and export classification reports as nicely formatted LaTeX tables, improving reporting and documentation. [[1]](diffhunk://#diff-8d79bb7b4277999923c858dfe6ed3ff4ff4e85c1781c3b8075d102dd5bf8c860L862-R905) [[2]](diffhunk://#diff-8d79bb7b4277999923c858dfe6ed3ff4ff4e85c1781c3b8075d102dd5bf8c860L891-R948) * Added timing code to measure and report SVM prediction inference time over multiple runs, supporting reproducibility and performance analysis. ### New functionality * Added a new section for testing with AU data, including code to load and preprocess the data and compute STFTs for selected columns, extending the notebook's applicability. These changes collectively improve the notebook's usability, maintainability, and reproducibility for STFT analysis and machine learning evaluation.
Sign in to join this conversation.