feat(features): integrate time-domain feature extraction into data pipeline

- Implement FeatureExtractor class in time_domain_features.py for calculating statistical features from dataset columns.
- Create build_features.py script to automate feature extraction from processed data and save results in a structured format.
- Adjust build_features.py to read processed data, utilize FeatureExtractor, and save feature matrix.

This update supports enhanced analysis capabilities within the thesis-project structure, allowing for more sophisticated data processing and model training stages.

Closes #1
This commit is contained in:
nuluh
2024-08-12 19:45:19 +07:00
parent 7d39176e27
commit a401d620eb
2 changed files with 30 additions and 6 deletions

View File

@@ -0,0 +1,21 @@
# src/features/build_features.py
import pandas as pd
from time_domain_features import FeatureExtractor
import numpy as np
def build_features(input_file, output_file):
data = pd.read_csv(input_file)
# Assuming the relevant data is in the first column
extractor = FeatureExtractor(data.iloc[:, 0].values)
features = extractor.features
# Save features to a file
np.savez(output_file, **features)
if __name__ == "__main__":
import sys
input_path = sys.argv[1] # 'data/processed/'
output_path = sys.argv[2] # 'data/features/feature_matrix.npz'
# Assuming only one file for simplicity; adapt as needed
build_features(f"{input_path}processed_data.csv", output_path)