Normalize Dataset by Preprocess Relatives Value Between Two Acceloremeter Sensors #15

Closed
opened 2024-08-26 13:45:37 +00:00 by nuluh · 3 comments
nuluh commented 2024-08-26 13:45:37 +00:00 (Migrated from github.com)

The approach suggested by the professor seems logical and could be effective for the SVM model. By dividing the features extracted from both accelerometers. it's essentially normalizing the differences in the vibration characteristics between the two ends of the beam. This can help the model to better capture the relative changes in vibration patterns, which are critical in identifying the location of damage.

Here’s why this makes sense:

  • Feature normalization: The division operation can normalize the differences between the two sensors, highlighting the proportional changes, which might be more indicative of damage location than absolute values.
  • Damage localization: If there's damage at a particular point on the beam, the vibration characteristics at the two ends are likely to differ. By analyzing these differences through the divided features, the SVM model may more effectively identify patterns that correlate with damage localization.
  • Training robustness: This approach can make the model less sensitive to variations in absolute values, focusing instead on the relative differences, which might make the model more robust.

When implementing this, ensure that the features choosen are meaningful when divided (e.g., mean, max, variance). Not all features may benefit from this operation, so it might be worth experimenting with different combinations.

The approach suggested by the professor seems logical and could be effective for the SVM model. By dividing the features extracted from both accelerometers. it's essentially normalizing the differences in the vibration characteristics between the two ends of the beam. This can help the model to better capture the relative changes in vibration patterns, which are critical in identifying the location of damage. Here’s why this makes sense: - **Feature normalization**: The division operation can normalize the differences between the two sensors, highlighting the proportional changes, which might be more indicative of damage location than absolute values. - **Damage localization**: If there's damage at a particular point on the beam, the vibration characteristics at the two ends are likely to differ. By analyzing these differences through the divided features, the SVM model may more effectively identify patterns that correlate with damage localization. - **Training robustness**: This approach can make the model less sensitive to variations in absolute values, focusing instead on the relative differences, which might make the model more robust. When implementing this, ensure that the features choosen are meaningful when divided (e.g., mean, max, variance). Not all features may benefit from this operation, so it might be worth experimenting with different combinations.
nuluh commented 2024-08-26 13:48:53 +00:00 (Migrated from github.com)

Need to preprocess the dataset for damage localization on the beam using both time domain and frequency domain data. The process involves the following steps:

Consistent Conduct Testing:

Ensure the hammering tests are conducted in the same position across all trials to maintain consistency in the data collection process.
Feature Extraction and Dataset Construction:

Extract features from both sensors placed on either end of the beam.

  • For each conduct test, divide the features from the second sensor by the corresponding features from the first sensor.
  • Create a dataset with the divided features for each test. For example, if sensor 1 has features {mean: 10, peak: 5}, and sensor 2 has features {mean: 5, peak: 2}, the final dataset entry should have features {delta_mean: 2, delta_peak: 2.5}.
  • Repeat this process for 10 conduct tests, resulting in a dataset with 10 rows.

Feature Naming Convention:

Use the prefix delta_ to name the features after division (e.g., delta_mean, delta_peak, etc.).

Labeling:

Ensure that each row in the dataset is correctly labeled according to the damage localization criteria.

Example of the Dataset (df.head(5)):

Sensors 1 features

mean peak variance skewness
10 5 4.8 1.2
11 4.9 4.7 1.1
9.5 5.1 4.9 1.3
10.2 5.2 4.6 1.0
10 5 4.8 1.2

Sensors 2 features

mean peak variance skewness
5 2 4.0 1.0
6.1 2.1 4.3 0.9
4.5 2.2 4.1 1.2
5.6 2.1 4.0 0.8
5 2 4.0 1.0

After Normalization (Dividing Sensor 2 Features by Sensor 1 Features):

delta_mean delta_peak delta_variance delta_skewness Label
2.0 2.5 1.2 0.8 0
1.8 2.3 1.1 0.9 0
2.1 2.6 1.3 0.7 0
1.9 2.4 1.2 0.6 1
2.0 2.5 1.2 0.8 1
Need to preprocess the dataset for damage localization on the beam using both time domain and frequency domain data. The process involves the following steps: ## Consistent Conduct Testing: Ensure the hammering tests are conducted in the same position across all trials to maintain consistency in the data collection process. Feature Extraction and Dataset Construction: ## Extract features from both sensors placed on either end of the beam. - For each conduct test, divide the features from the second sensor by the corresponding features from the first sensor. - Create a dataset with the divided features for each test. For example, if sensor 1 has features {mean: 10, peak: 5}, and sensor 2 has features {mean: 5, peak: 2}, the final dataset entry should have features {delta_mean: 2, delta_peak: 2.5}. - Repeat this process for 10 conduct tests, resulting in a dataset with 10 rows. ## Feature Naming Convention: Use the prefix `delta_` to name the features after division (e.g., delta_mean, delta_peak, etc.). ## Labeling: Ensure that each row in the dataset is correctly labeled according to the damage localization criteria. ## Example of the Dataset (df.head(5)): ### Sensors 1 features | mean | peak | variance | skewness | |------|------|----------|----------| | 10 | 5 | 4.8 | 1.2 | | 11 | 4.9 | 4.7 | 1.1 | | 9.5 | 5.1 | 4.9 | 1.3 | | 10.2 | 5.2 | 4.6 | 1.0 | | 10 | 5 | 4.8 | 1.2 | ### Sensors 2 features | mean | peak | variance | skewness | |------|------|----------|----------| | 5 | 2 | 4.0 | 1.0 | | 6.1 | 2.1 | 4.3 | 0.9 | | 4.5 | 2.2 | 4.1 | 1.2 | | 5.6 | 2.1 | 4.0 | 0.8 | | 5 | 2 | 4.0 | 1.0 | ### After Normalization (Dividing Sensor 2 Features by Sensor 1 Features): | delta_mean | delta_peak | delta_variance | delta_skewness | Label | |------------|------------|----------------|----------------|-------| | 2.0 | 2.5 | 1.2 | 0.8 | 0 | | 1.8 | 2.3 | 1.1 | 0.9 | 0 | | 2.1 | 2.6 | 1.3 | 0.7 | 0 | | 1.9 | 2.4 | 1.2 | 0.6 | 1 | | 2.0 | 2.5 | 1.2 | 0.8 | 1 |
nuluh commented 2024-08-26 14:01:08 +00:00 (Migrated from github.com)

Need reference paper for this

Need reference paper for this
nuluh commented 2024-08-27 05:24:43 +00:00 (Migrated from github.com)

suggested algorithm:

  • rewrite the build_features() function by adding optional args of sensor number -> build_features(input_dir, sensor=None)
  • Check if the file has the specified sensor suffix
def build_features(input_dir:str, sensor:int=None:
...
            for nth_test in os.listdir(nth_damage_path):
                nth_test_path = os.path.join(nth_damage_path, nth_test)
                if sensor is not None:
                    # Check if the file has the specified sensor suffix
                    if not nth_test.endswith(f'_{sensor}.csv'):
                        continue
                features = ExtractTimeFeatures(nth_test_path)  # return the one csv file feature in dictionary {}
                features['label'] = extract_numbers(nth_test)[0]  # add labels to the dictionary
                features['filename'] = nth_test  # add filename to the dictionary
                all_features.append(features)
  • execute the function with the args:
data_dir = "../../data/raw"
# Extract features
df1 = build_features(data_dir, sensor=1)
df2 = build_features(data_dir, sensor=2)
  • perform division operation for the feature columns between two DataFrame
df_relative = df2.iloc[:, :-2] / df1.iloc[:, :-2]
suggested algorithm: - rewrite the `build_features()` function by adding optional args of sensor number -> `build_features(input_dir, sensor=None)` - Check if the file has the specified sensor suffix ```python def build_features(input_dir:str, sensor:int=None: ... for nth_test in os.listdir(nth_damage_path): nth_test_path = os.path.join(nth_damage_path, nth_test) if sensor is not None: # Check if the file has the specified sensor suffix if not nth_test.endswith(f'_{sensor}.csv'): continue features = ExtractTimeFeatures(nth_test_path) # return the one csv file feature in dictionary {} features['label'] = extract_numbers(nth_test)[0] # add labels to the dictionary features['filename'] = nth_test # add filename to the dictionary all_features.append(features) ``` - execute the function with the args: ```python data_dir = "../../data/raw" # Extract features df1 = build_features(data_dir, sensor=1) df2 = build_features(data_dir, sensor=2) ``` - perform division operation for the feature columns between two DataFrame ```python df_relative = df2.iloc[:, :-2] / df1.iloc[:, :-2] ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nuluh/thesis#15