Implement STFT with Hann Windowing (STFT) #22

New Issue

2024-10-18T20:51:39Z

nuluh commented

2024-10-18 20:51:39 +00:00

(Migrated from github.com)

Step 7: Applying Short-Time Fourier Transform (STFT)

The code performs STFT on each signal to extract frequency-domain features.

from scipy.signal import stft, hann

for i in range(16):
    vibration_data = signal_sensor1[i]  # For sensor 1
    # Define STFT parameters
    window_size = 1024
    hop_size = 512
    window = hann(window_size)
    # Apply STFT
    frequencies, times, Zxx = stft(vibration_data, window=window, nperseg=window_size, noverlap=window_size - hop_size)
    # Store the magnitude of STFT
    stft_data = np.abs(Zxx)
    # Transpose and convert to DataFrame
    stft_data_transposed = stft_data.T
    df = pd.DataFrame(stft_data_transposed, columns=[f"Freq_{freq:.2f}" for freq in frequencies])
    # Save to CSV
    df.to_csv(f'/kaggle/working/stft_data1_{i+1}.csv', index=False)

Explanation:

STFT Parameters:
- Window Size: 1024 samples.
- Hop Size: 512 samples (50% overlap).
- Window Function: Hanning window to minimize spectral leakage.
Result: The STFT provides a time-frequency representation of the signal, capturing how the frequency content changes over time.

Visual Representation of STFT Output for One Case:

Time \ Frequency	Freq_0.00	Freq_0.98	Freq_1.95	...	Freq_N
Time_0	...	...	...	...	...
Time_1	...	...	...	...	...
...	...	...	...	...	...
Time_M	...	...	...	...	...

Step 8: Loading STFT Data from CSV Files

The saved STFT data is loaded back into DataFrames for further processing.

ready_data_signal_sensor1_1 = pd.read_csv("/kaggle/working/stft_data1_1.csv")
# Similarly for other cases and sensor 2

Visual Representation:

Freq_0.00	Freq_0.98	Freq_1.95	...	Freq_N
...	...	...	...	...
...	...	...	...	...
...	...	...	...	...

(Each DataFrame represents the STFT features for one case and one sensor.)

Step 9: Combining STFT DataFrames into Lists

All STFT DataFrames are stored in lists for each sensor.

ready_data1 = [ready_data_signal_sensor1_1, ready_data_signal_sensor1_2, ..., ready_data_signal_sensor1_16]
ready_data2 = [ready_data_signal_sensor2_1, ready_data_signal_sensor2_2, ..., ready_data_signal_sensor2_16]

Step 10: Concatenating DataFrames to Form Feature Matrices

DataFrames from all cases are concatenated to create feature matrices for each sensor.

# For sensor 1
x1 = ready_data1[0]
for i in range(len(ready_data1) - 1):
    x1 = np.concatenate((x1, ready_data1[i + 1]), axis=0)

# For sensor 2
x2 = ready_data2[0]
for i in range(len(ready_data2) - 1):
    x2 = np.concatenate((x2, ready_data2[i + 1]), axis=0)

Resulting Feature Matrix (x1 or x2):

Freq_0.00	Freq_0.98	Freq_1.95	...	Freq_N
Sample 1	...	...	...	...
Sample 2	...	...	...	...
...	...	...	...	...
Sample K	...	...	...	...

Samples: Each row corresponds to a time window from the STFT data.
Features: Each column corresponds to a frequency bin.

Step 11: Creating Labels for Each Sample

The code assigns labels to each sample based on the case it belongs to.

# Define labels for each case
y_data = list(range(16))  # Labels from 0 to 15 for 16 cases

# Replicate labels to match the number of samples in each case
for i in range(len(y_data)):
    y_data[i] = [y_data[i]] * ready_data1[i].shape[0]
    y_data[i] = np.array(y_data[i])

Visual Representation:

For Case 1:

y_data[0] = [0, 0, 0, ..., 0]  # Length equals number of samples in Case 1

Step 12: Concatenating Labels into a Single Array

All label arrays are concatenated to form a single target vector.

y = y_data[0]
for i in range(len(y_data) - 1):
    y = np.concatenate((y, y_data[i+1]), axis=0)

Resulting Label Vector (y):

[0, 0, ..., 0, 1, 1, ..., 1, ..., 15, 15, ..., 15]

The length of y matches the number of samples in the feature matrix (x1 or x2).

Step 13: Splitting Data into Training and Testing Sets

The data is split into training and testing sets to evaluate the models.

from sklearn.model_selection import train_test_split

# For sensor 1
x_train1, x_test1, y_train, y_test = train_test_split(x1, y, test_size=0.2, random_state=2)

# For sensor 2
x_train2, x_test2, y_train, y_test = train_test_split(x2, y, test_size=0.2, random_state=2)

Training Set: 80% of the data.
Testing Set: 20% of the data.
random_state=2: Ensures reproducibility.

Visual Representation:

Dataset	Features (`x`)	Labels (`y`)
Training	`x_train1` or `x_train2`	`y_train`
Testing	`x_test1` or `x_test2`	`y_test`

Step 14: Data Prepared for Modeling

At this stage, the data is prepared and ready for training machine learning models.

Summary of Prepared Data:

Dataset	Sensor 1 Features	Sensor 2 Features	Labels
Training	`x_train1`	`x_train2`	`y_train`
Testing	`x_test1`	`x_test2`	`y_test`

Feature Matrices (x_train1, x_test1, x_train2, x_test2): Contain the STFT features.
Label Vectors (y_train, y_test): Contain the corresponding case labels.

Data Shapes:

Assuming the total number of samples is N and the number of features is F:

x_train1.shape = (0.8 * N, F)
x_test1.shape = (0.2 * N, F)
y_train.shape = (0.8 * N,)
y_test.shape = (0.2 * N,)

Visual Summary of Data Preparation Steps

Raw Data:
- Files: 16 CSV files (Cases 1 to 16).
- DataFrames: df1, df2, ..., df16.
Data Cleaning:
- Columns Renamed: Standardized to ['sensor 1', 'sensor 2'].
Signal Extraction:
- Lists: signal_sensor1, signal_sensor2.
Feature Extraction with STFT:
- STFT Applied: To each signal.
- Features: Time-frequency representation.
Feature Matrices:
- DataFrames: ready_data1, ready_data2.
- Concatenated Matrices: x1, x2.
Label Creation:
- Labels: y_data for each case.
- Combined Labels: y.
Data Splitting:
- Training Data: x_train1, x_train2, y_train.
- Testing Data: x_test1, x_test2, y_test.

Conclusion

The data preparation process transforms raw vibration signals into a structured format suitable for machine learning:

Consistency: Standardized column names and data structures.
Feature Richness: STFT provides both time and frequency domain information.
Label Alignment: Each feature vector has a corresponding label indicating the bolt condition.
Prepared Datasets: Ready for model training and evaluation.

By following these steps, the code ensures that the machine learning models have high-quality data to learn from, which is crucial for accurate bolt loosening detection.

Note: This explanation focuses on the data preparation steps, providing visual representations and detailed descriptions to help you understand how the code transforms raw data into features and labels suitable for machine learning models.

## **Step 7: Applying Short-Time Fourier Transform (STFT)** The code performs STFT on each signal to extract frequency-domain features. ```python from scipy.signal import stft, hann for i in range(16): vibration_data = signal_sensor1[i] # For sensor 1 # Define STFT parameters window_size = 1024 hop_size = 512 window = hann(window_size) # Apply STFT frequencies, times, Zxx = stft(vibration_data, window=window, nperseg=window_size, noverlap=window_size - hop_size) # Store the magnitude of STFT stft_data = np.abs(Zxx) # Transpose and convert to DataFrame stft_data_transposed = stft_data.T df = pd.DataFrame(stft_data_transposed, columns=[f"Freq_{freq:.2f}" for freq in frequencies]) # Save to CSV df.to_csv(f'/kaggle/working/stft_data1_{i+1}.csv', index=False) ``` **Explanation:** - **STFT Parameters:** - **Window Size:** 1024 samples. - **Hop Size:** 512 samples (50% overlap). - **Window Function:** Hanning window to minimize spectral leakage. - **Result:** The STFT provides a time-frequency representation of the signal, capturing how the frequency content changes over time. **Visual Representation of STFT Output for One Case:** | Time \ Frequency | Freq_0.00 | Freq_0.98 | Freq_1.95 | ... | Freq_N | |------------------|-----------|-----------|-----------|-----|--------| | Time_0 | ... | ... | ... | ... | ... | | Time_1 | ... | ... | ... | ... | ... | | ... | ... | ... | ... | ... | ... | | Time_M | ... | ... | ... | ... | ... | --- ## **Step 8: Loading STFT Data from CSV Files** The saved STFT data is loaded back into DataFrames for further processing. ```python ready_data_signal_sensor1_1 = pd.read_csv("/kaggle/working/stft_data1_1.csv") # Similarly for other cases and sensor 2 ``` **Visual Representation:** | Freq_0.00 | Freq_0.98 | Freq_1.95 | ... | Freq_N | |-----------|-----------|-----------|-----|--------| | ... | ... | ... | ... | ... | | ... | ... | ... | ... | ... | | ... | ... | ... | ... | ... | *(Each DataFrame represents the STFT features for one case and one sensor.)* --- ## **Step 9: Combining STFT DataFrames into Lists** All STFT DataFrames are stored in lists for each sensor. ```python ready_data1 = [ready_data_signal_sensor1_1, ready_data_signal_sensor1_2, ..., ready_data_signal_sensor1_16] ready_data2 = [ready_data_signal_sensor2_1, ready_data_signal_sensor2_2, ..., ready_data_signal_sensor2_16] ``` --- ## **Step 10: Concatenating DataFrames to Form Feature Matrices** DataFrames from all cases are concatenated to create feature matrices for each sensor. ```python # For sensor 1 x1 = ready_data1[0] for i in range(len(ready_data1) - 1): x1 = np.concatenate((x1, ready_data1[i + 1]), axis=0) # For sensor 2 x2 = ready_data2[0] for i in range(len(ready_data2) - 1): x2 = np.concatenate((x2, ready_data2[i + 1]), axis=0) ``` **Resulting Feature Matrix (`x1` or `x2`):** | Freq_0.00 | Freq_0.98 | Freq_1.95 | ... | Freq_N | |-----------|-----------|-----------|-----|--------| | Sample 1 | ... | ... | ... | ... | | Sample 2 | ... | ... | ... | ... | | ... | ... | ... | ... | ... | | Sample K | ... | ... | ... | ... | - **Samples:** Each row corresponds to a time window from the STFT data. - **Features:** Each column corresponds to a frequency bin. --- ## **Step 11: Creating Labels for Each Sample** The code assigns labels to each sample based on the case it belongs to. ```python # Define labels for each case y_data = list(range(16)) # Labels from 0 to 15 for 16 cases # Replicate labels to match the number of samples in each case for i in range(len(y_data)): y_data[i] = [y_data[i]] * ready_data1[i].shape[0] y_data[i] = np.array(y_data[i]) ``` **Visual Representation:** For Case 1: ```plaintext y_data[0] = [0, 0, 0, ..., 0] # Length equals number of samples in Case 1 ``` --- ## **Step 12: Concatenating Labels into a Single Array** All label arrays are concatenated to form a single target vector. ```python y = y_data[0] for i in range(len(y_data) - 1): y = np.concatenate((y, y_data[i+1]), axis=0) ``` **Resulting Label Vector (`y`):** ```plaintext [0, 0, ..., 0, 1, 1, ..., 1, ..., 15, 15, ..., 15] ``` - The length of `y` matches the number of samples in the feature matrix (`x1` or `x2`). --- ## **Step 13: Splitting Data into Training and Testing Sets** The data is split into training and testing sets to evaluate the models. ```python from sklearn.model_selection import train_test_split # For sensor 1 x_train1, x_test1, y_train, y_test = train_test_split(x1, y, test_size=0.2, random_state=2) # For sensor 2 x_train2, x_test2, y_train, y_test = train_test_split(x2, y, test_size=0.2, random_state=2) ``` - **Training Set:** 80% of the data. - **Testing Set:** 20% of the data. - **`random_state=2`:** Ensures reproducibility. **Visual Representation:** | Dataset | Features (`x`) | Labels (`y`) | |------------|---------------------------|-----------------------| | Training | `x_train1` or `x_train2` | `y_train` | | Testing | `x_test1` or `x_test2` | `y_test` | --- ## **Step 14: Data Prepared for Modeling** At this stage, the data is prepared and ready for training machine learning models. **Summary of Prepared Data:** | Dataset | Sensor 1 Features | Sensor 2 Features | Labels | |------------|-------------------|-------------------|--------------| | Training | `x_train1` | `x_train2` | `y_train` | | Testing | `x_test1` | `x_test2` | `y_test` | - **Feature Matrices (`x_train1`, `x_test1`, `x_train2`, `x_test2`):** Contain the STFT features. - **Label Vectors (`y_train`, `y_test`):** Contain the corresponding case labels. **Data Shapes:** Assuming the total number of samples is `N` and the number of features is `F`: - `x_train1.shape` = `(0.8 * N, F)` - `x_test1.shape` = `(0.2 * N, F)` - `y_train.shape` = `(0.8 * N,)` - `y_test.shape` = `(0.2 * N,)` --- ## **Visual Summary of Data Preparation Steps** 1. **Raw Data:** - **Files:** 16 CSV files (Cases 1 to 16). - **DataFrames:** `df1`, `df2`, ..., `df16`. 2. **Data Cleaning:** - **Columns Renamed:** Standardized to `['sensor 1', 'sensor 2']`. 3. **Signal Extraction:** - **Lists:** `signal_sensor1`, `signal_sensor2`. 4. **Feature Extraction with STFT:** - **STFT Applied:** To each signal. - **Features:** Time-frequency representation. 5. **Feature Matrices:** - **DataFrames:** `ready_data1`, `ready_data2`. - **Concatenated Matrices:** `x1`, `x2`. 6. **Label Creation:** - **Labels:** `y_data` for each case. - **Combined Labels:** `y`. 7. **Data Splitting:** - **Training Data:** `x_train1`, `x_train2`, `y_train`. - **Testing Data:** `x_test1`, `x_test2`, `y_test`. --- ## **Conclusion** The data preparation process transforms raw vibration signals into a structured format suitable for machine learning: - **Consistency:** Standardized column names and data structures. - **Feature Richness:** STFT provides both time and frequency domain information. - **Label Alignment:** Each feature vector has a corresponding label indicating the bolt condition. - **Prepared Datasets:** Ready for model training and evaluation. By following these steps, the code ensures that the machine learning models have high-quality data to learn from, which is crucial for accurate bolt loosening detection. --- **Note:** This explanation focuses on the data preparation steps, providing visual representations and detailed descriptions to help you understand how the code transforms raw data into features and labels suitable for machine learning models.

Sign in to join this conversation.

Branches Tags

main

dev

feature/chapter-2-literature-review

feature/chapter-4-results

feature/chapter-3-methodology-steps

exp/74-exp-cross-dataset-validation

exp/74-exp-cross-dataset-validation-b2bf1b0

feat/103-feat-inference-function

feature/101-feat-time-elapsed-for-training-and-inference

feature/99-exp-alternative-undamage-case-data

feat/90-feat-preserve-trained-model

latex/75-enhance-background-research

wuicace-2025

revert-92-latex/91-bug-expose-maketitle

latex/91-bug-expose-maketitle

latex/documentclass

latex/frontmatter

latex/bib

latex/methodology

latex/literature-review

latex/theoritical-foundation

latex/background

latex/68-feat-refactor-chapter-two

68-feat-refactor-chapter-two

latex/initial-template

59-feat-add-acknowledgement-page

57-feat-add-dynamic-page-style-for-chapter-page

latex/fix-table-of-contents-styling

56-bug-endorsementpage-error

latex/54-doc-summary-table-of-past-realted-research

feature/48-feat-refactor-stft-preprocessing-and-training-pipeline-into-importable-modules

40-feat-add-export-to-csv-method-for-dataprocessor-in-convertpy

43-bug-stft-csv-export-has-incorrect-shape-and-column-format

feature/38-feat-redesign-convertpy

feature/37-feat-add-data-processing-script-for-dataset-b-outside-training-data

stft

feature/19-qugs-data

feature/15-normalize-dataset-by-preprocess-relatives-value-between-two-acceloremeter-sensors

feature/automate-csv-file

revert-8-feature/csv-padding-naming

feature/5-create-fft-script

feature/10-add-labels-column-to-time-domain-feature-extraction-dataframe

feature/csv-padding-naming

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: nuluh/thesis#22