Normalize Dataset by Preprocess Relatives Value Between Two Acceloremeter Sensors #15

New Issue

2024-08-26T13:45:37Z

nuluh commented

2024-08-26 13:45:37 +00:00

(Migrated from github.com)

The approach suggested by the professor seems logical and could be effective for the SVM model. By dividing the features extracted from both accelerometers. it's essentially normalizing the differences in the vibration characteristics between the two ends of the beam. This can help the model to better capture the relative changes in vibration patterns, which are critical in identifying the location of damage.

Here’s why this makes sense:

Feature normalization: The division operation can normalize the differences between the two sensors, highlighting the proportional changes, which might be more indicative of damage location than absolute values.
Damage localization: If there's damage at a particular point on the beam, the vibration characteristics at the two ends are likely to differ. By analyzing these differences through the divided features, the SVM model may more effectively identify patterns that correlate with damage localization.
Training robustness: This approach can make the model less sensitive to variations in absolute values, focusing instead on the relative differences, which might make the model more robust.

When implementing this, ensure that the features choosen are meaningful when divided (e.g., mean, max, variance). Not all features may benefit from this operation, so it might be worth experimenting with different combinations.

The approach suggested by the professor seems logical and could be effective for the SVM model. By dividing the features extracted from both accelerometers. it's essentially normalizing the differences in the vibration characteristics between the two ends of the beam. This can help the model to better capture the relative changes in vibration patterns, which are critical in identifying the location of damage. Here’s why this makes sense: - **Feature normalization**: The division operation can normalize the differences between the two sensors, highlighting the proportional changes, which might be more indicative of damage location than absolute values. - **Damage localization**: If there's damage at a particular point on the beam, the vibration characteristics at the two ends are likely to differ. By analyzing these differences through the divided features, the SVM model may more effectively identify patterns that correlate with damage localization. - **Training robustness**: This approach can make the model less sensitive to variations in absolute values, focusing instead on the relative differences, which might make the model more robust. When implementing this, ensure that the features choosen are meaningful when divided (e.g., mean, max, variance). Not all features may benefit from this operation, so it might be worth experimenting with different combinations.

nuluh commented

2024-08-26 13:48:53 +00:00

(Migrated from github.com)

Need to preprocess the dataset for damage localization on the beam using both time domain and frequency domain data. The process involves the following steps:

Consistent Conduct Testing:

Ensure the hammering tests are conducted in the same position across all trials to maintain consistency in the data collection process.
Feature Extraction and Dataset Construction:

Extract features from both sensors placed on either end of the beam.

For each conduct test, divide the features from the second sensor by the corresponding features from the first sensor.
Create a dataset with the divided features for each test. For example, if sensor 1 has features {mean: 10, peak: 5}, and sensor 2 has features {mean: 5, peak: 2}, the final dataset entry should have features {delta_mean: 2, delta_peak: 2.5}.
Repeat this process for 10 conduct tests, resulting in a dataset with 10 rows.

Feature Naming Convention:

Use the prefix delta_ to name the features after division (e.g., delta_mean, delta_peak, etc.).

Labeling:

Ensure that each row in the dataset is correctly labeled according to the damage localization criteria.

Example of the Dataset (df.head(5)):

Sensors 1 features

mean	peak	variance	skewness
10	5	4.8	1.2
11	4.9	4.7	1.1
9.5	5.1	4.9	1.3
10.2	5.2	4.6	1.0
10	5	4.8	1.2

Sensors 2 features

mean	peak	variance	skewness
5	2	4.0	1.0
6.1	2.1	4.3	0.9
4.5	2.2	4.1	1.2
5.6	2.1	4.0	0.8
5	2	4.0	1.0

After Normalization (Dividing Sensor 2 Features by Sensor 1 Features):

delta_mean	delta_peak	delta_variance	delta_skewness	Label
2.0	2.5	1.2	0.8	0
1.8	2.3	1.1	0.9	0
2.1	2.6	1.3	0.7	0
1.9	2.4	1.2	0.6	1
2.0	2.5	1.2	0.8	1

Need to preprocess the dataset for damage localization on the beam using both time domain and frequency domain data. The process involves the following steps: ## Consistent Conduct Testing: Ensure the hammering tests are conducted in the same position across all trials to maintain consistency in the data collection process. Feature Extraction and Dataset Construction: ## Extract features from both sensors placed on either end of the beam. - For each conduct test, divide the features from the second sensor by the corresponding features from the first sensor. - Create a dataset with the divided features for each test. For example, if sensor 1 has features {mean: 10, peak: 5}, and sensor 2 has features {mean: 5, peak: 2}, the final dataset entry should have features {delta_mean: 2, delta_peak: 2.5}. - Repeat this process for 10 conduct tests, resulting in a dataset with 10 rows. ## Feature Naming Convention: Use the prefix `delta_` to name the features after division (e.g., delta_mean, delta_peak, etc.). ## Labeling: Ensure that each row in the dataset is correctly labeled according to the damage localization criteria. ## Example of the Dataset (df.head(5)): ### Sensors 1 features | mean | peak | variance | skewness | |------|------|----------|----------| | 10 | 5 | 4.8 | 1.2 | | 11 | 4.9 | 4.7 | 1.1 | | 9.5 | 5.1 | 4.9 | 1.3 | | 10.2 | 5.2 | 4.6 | 1.0 | | 10 | 5 | 4.8 | 1.2 | ### Sensors 2 features | mean | peak | variance | skewness | |------|------|----------|----------| | 5 | 2 | 4.0 | 1.0 | | 6.1 | 2.1 | 4.3 | 0.9 | | 4.5 | 2.2 | 4.1 | 1.2 | | 5.6 | 2.1 | 4.0 | 0.8 | | 5 | 2 | 4.0 | 1.0 | ### After Normalization (Dividing Sensor 2 Features by Sensor 1 Features): | delta_mean | delta_peak | delta_variance | delta_skewness | Label | |------------|------------|----------------|----------------|-------| | 2.0 | 2.5 | 1.2 | 0.8 | 0 | | 1.8 | 2.3 | 1.1 | 0.9 | 0 | | 2.1 | 2.6 | 1.3 | 0.7 | 0 | | 1.9 | 2.4 | 1.2 | 0.6 | 1 | | 2.0 | 2.5 | 1.2 | 0.8 | 1 |

nuluh commented

2024-08-26 14:01:08 +00:00

(Migrated from github.com)

Need reference paper for this

👀 1

nuluh commented

2024-08-27 05:24:43 +00:00

(Migrated from github.com)

suggested algorithm:

rewrite the build_features() function by adding optional args of sensor number -> build_features(input_dir, sensor=None)
Check if the file has the specified sensor suffix

def build_features(input_dir:str, sensor:int=None:
...
            for nth_test in os.listdir(nth_damage_path):
                nth_test_path = os.path.join(nth_damage_path, nth_test)
                if sensor is not None:
                    # Check if the file has the specified sensor suffix
                    if not nth_test.endswith(f'_{sensor}.csv'):
                        continue
                features = ExtractTimeFeatures(nth_test_path)  # return the one csv file feature in dictionary {}
                features['label'] = extract_numbers(nth_test)[0]  # add labels to the dictionary
                features['filename'] = nth_test  # add filename to the dictionary
                all_features.append(features)

execute the function with the args:

data_dir = "../../data/raw"
# Extract features
df1 = build_features(data_dir, sensor=1)
df2 = build_features(data_dir, sensor=2)

perform division operation for the feature columns between two DataFrame

df_relative = df2.iloc[:, :-2] / df1.iloc[:, :-2]

suggested algorithm: - rewrite the `build_features()` function by adding optional args of sensor number -> `build_features(input_dir, sensor=None)` - Check if the file has the specified sensor suffix ```python def build_features(input_dir:str, sensor:int=None: ... for nth_test in os.listdir(nth_damage_path): nth_test_path = os.path.join(nth_damage_path, nth_test) if sensor is not None: # Check if the file has the specified sensor suffix if not nth_test.endswith(f'_{sensor}.csv'): continue features = ExtractTimeFeatures(nth_test_path) # return the one csv file feature in dictionary {} features['label'] = extract_numbers(nth_test)[0] # add labels to the dictionary features['filename'] = nth_test # add filename to the dictionary all_features.append(features) ``` - execute the function with the args: ```python data_dir = "../../data/raw" # Extract features df1 = build_features(data_dir, sensor=1) df2 = build_features(data_dir, sensor=2) ``` - perform division operation for the feature columns between two DataFrame ```python df_relative = df2.iloc[:, :-2] / df1.iloc[:, :-2] ```

Sign in to join this conversation.

Branches Tags

main

dev

feature/chapter-2-literature-review

feature/chapter-4-results

feature/chapter-3-methodology-steps

exp/74-exp-cross-dataset-validation

exp/74-exp-cross-dataset-validation-b2bf1b0

feat/103-feat-inference-function

feature/101-feat-time-elapsed-for-training-and-inference

feature/99-exp-alternative-undamage-case-data

feat/90-feat-preserve-trained-model

latex/75-enhance-background-research

wuicace-2025

revert-92-latex/91-bug-expose-maketitle

latex/91-bug-expose-maketitle

latex/documentclass

latex/frontmatter

latex/bib

latex/methodology

latex/literature-review

latex/theoritical-foundation

latex/background

latex/68-feat-refactor-chapter-two

68-feat-refactor-chapter-two

latex/initial-template

59-feat-add-acknowledgement-page

57-feat-add-dynamic-page-style-for-chapter-page

latex/fix-table-of-contents-styling

56-bug-endorsementpage-error

latex/54-doc-summary-table-of-past-realted-research

feature/48-feat-refactor-stft-preprocessing-and-training-pipeline-into-importable-modules

40-feat-add-export-to-csv-method-for-dataprocessor-in-convertpy

43-bug-stft-csv-export-has-incorrect-shape-and-column-format

feature/38-feat-redesign-convertpy

feature/37-feat-add-data-processing-script-for-dataset-b-outside-training-data

stft

feature/19-qugs-data

feature/15-normalize-dataset-by-preprocess-relatives-value-between-two-acceloremeter-sensors

feature/automate-csv-file

revert-8-feature/csv-padding-naming

feature/5-create-fft-script

feature/10-add-labels-column-to-time-domain-feature-extraction-dataframe

feature/csv-padding-naming

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: nuluh/thesis#15

delta_mean	delta_peak	delta_variance	delta_skewness	Label
2.0	2.5	1.2	0.8	0
1.8	2.3	1.1	0.9	0
2.1	2.6	1.3	0.7	0
1.9	2.4	1.2	0.6	1
2.0	2.5	1.2	0.8	1

delta_mean	delta_peak	delta_variance	delta_skewness	Label
2.0	2.5	1.2	0.8	0
1.8	2.3	1.1	0.9	0
2.1	2.6	1.3	0.7	0
1.9	2.4	1.2	0.6	1
2.0	2.5	1.2	0.8	1

delta_mean	delta_peak	delta_variance	delta_skewness	Label
2.0	2.5	1.2	0.8	0
1.8	2.3	1.1	0.9	0
2.1	2.6	1.3	0.7	0
1.9	2.4	1.2	0.6	1
2.0	2.5	1.2	0.8	1