* wip: add function to create stratified train-test split from STFT data
* feat(src): implement working function for dataset B to create ready data from STFT files stft_files and add setup.py for package configuration
* feat(notebook): Update variable names for clarity, remove unused imports, and streamline data processing. Implement data concatenation using pandas concat for efficiency. Add validation steps for Dataset B and improve model training consistency across sensors.
* fix(.gitignore): add rule to ignore egg-info directories and ensure proper formatting
* docs(README): add instructions for running stft.ipynb notebook
* feat(notebook): Add evaluation metrics and confusion matrix visualizations for model predictions on Dataset B. Remove commented-out code and integrate data preparation using create_ready_data function.
---------
Co-authored-by: nuluh <dam.ar@outlook.com>
feat(stft): Implement STFT processing for vibration data with multiprocessing support to include all the data for training process instead of just using `TEST1` only
- Modify `build_features` function to support iterative processing across nested directories, enhancing the system's ability to handle larger datasets and varied input structures.
- Replace direct usage of `FeatureExtractor` class with `ExtractTimeFeatures` function, which now acts as a wrapper to include this class, facilitating streamlined integration and maintenance of feature extraction processes.
- Implement `extract_numbers` function using regex to parse filenames and extract numeric identifiers, used for labels when training with SVM
- Switch output from `.npz` to `.csv` format in `build_features`, offering better compatibility with data analysis tools and readability.
- Update documentation and comments within the code to reflect changes in functionality and usage of the new feature extraction setup.
Closes#4
- Implement FeatureExtractor class in time_domain_features.py for calculating statistical features from dataset columns.
- Create build_features.py script to automate feature extraction from processed data and save results in a structured format.
- Adjust build_features.py to read processed data, utilize FeatureExtractor, and save feature matrix.
This update supports enhanced analysis capabilities within the thesis-project structure, allowing for more sophisticated data processing and model training stages.
Closes#1
The code changes add a new file `time_domain_features.py` that contains a `FeatureExtractor` class. This class calculates various time domain features for a given dataset. The features include mean, max, peak, peak-to-peak, RMS, variance, standard deviation, power, crest factor, form factor, pulse indicator, margin, kurtosis, and skewness.
The class takes a file path as input and reads the data from a CSV file. It assumes the data to analyze is in the first column. The calculated features are stored in a dictionary.
The commit message suggests that the purpose of the changes is to add a new class for time domain feature extraction.