[FEAT] Refactor Training Cell #88

New Issue

2025-05-29T14:43:35Z

nuluh commented

2025-05-29 14:43:35 +00:00

(Migrated from github.com)

Problem Statement

The training and evaluation code in the notebook lacks modularity, as evidenced in cell 43 of stft.ipynb. This repetition of logic for sensor processing makes the workflow cumbersome and manual comparison inefficient.

Proposed Solution

Create a helper function (e.g., train_and_evaluate_model) that:
- Receives a model instance along with training and test data.
- Times the .fit() call.
- Evaluates accuracy.
- Returns a dictionary with keys like "model", "accuracy", and "training_time"
Store your models in a dictionary or list so you can loop over them, reducing duplicated code.
Finally, output the results (a list of dictionaries) which can easily be converted to JSON if needed.

Alternatives Considered

I considered leaving the code as-is or writing separate functions for each sensor. However, a single refactored function that accepts the sensor label as a parameter simplifies the code and encourages reusability.

Component

Python Source Code
ML Model

Priority

High (significantly improves workflow)

Implementation Ideas

Create a helper function train_and_evaluate_model that encapsulates model fitting, prediction, and accuracy calculation.
Ensure the function receives a sensor label (e.g., "sensor1" or "sensor2") and outputs a dictionary with keys like model, sensor, and accuracy.
Loop over a dictionary of models for each sensor, call the helper function, and collect the output in a JSON-like structure for easy comparison and visualization.

Expected Benefits

Modularity: The helper function train_and_evaluate_model encapsulates timing, training, and evaluation.
DRY Principle: Using dictionaries to loop over model definitions avoids repeating code.
JSON-like Output: The results are stored in dictionaries (and then nested in a larger dictionary) so the output is easy to export to JSON and use for plotting later.
Sensor Index: The sensor is explicitly noted via the sensor label (e.g., "sensor1" or "sensor2") in the output, clearly associate a given model's performance with a specific sensor.

Additional Context

### Problem Statement The training and evaluation code in the notebook lacks modularity, as evidenced in cell 43 of [stft.ipynb](https://github.com/nuluh/thesis/blob/main/code/notebooks/stft.ipynb). This repetition of logic for sensor processing makes the workflow cumbersome and manual comparison inefficient. ### Proposed Solution 1. Create a helper function (e.g., train_and_evaluate_model) that: - Receives a model instance along with training and test data. - Times the .fit() call. - Evaluates accuracy. - Returns a dictionary with keys like "model", "accuracy", and "training_time" 2. Store your models in a dictionary or list so you can loop over them, reducing duplicated code. 3. Finally, output the results (a list of dictionaries) which can easily be converted to JSON if needed. ### Alternatives Considered I considered leaving the code as-is or writing separate functions for each sensor. However, a single refactored function that accepts the sensor label as a parameter simplifies the code and encourages reusability. ### Component - Python Source Code - ML Model ### Priority - High (significantly improves workflow) ### Implementation Ideas - Create a helper function `train_and_evaluate_model` that encapsulates model fitting, prediction, and accuracy calculation. - Ensure the function receives a sensor label (e.g., `"sensor1"` or `"sensor2"`) and outputs a dictionary with keys like `model`, `sensor`, and `accuracy`. - Loop over a dictionary of models for each sensor, call the helper function, and collect the output in a JSON-like structure for easy comparison and visualization. ### Expected Benefits - **Modularity:** The helper function `train_and_evaluate_model` encapsulates timing, training, and evaluation. - **DRY Principle:** Using dictionaries to loop over model definitions avoids repeating code. - **JSON-like Output:** The results are stored in dictionaries (and then nested in a larger dictionary) so the output is easy to export to JSON and use for plotting later. - **Sensor Index:** The sensor is explicitly noted via the sensor label (e.g., `"sensor1"` or `"sensor2"`) in the output, clearly associate a given model's performance with a specific sensor. ### Additional Context -

nuluh commented

2025-05-29 14:54:20 +00:00

(Migrated from github.com)

def train_and_evaluate_model(model, model_name, sensor_label, x_train, y_train, x_test, y_test):
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    accuracy = accuracy_score(y_test, y_pred) * 100
    return {
        "model": model_name,
        "sensor": sensor_label,
        "accuracy": accuracy
    }

models_sensor1 = {
    "Random Forest": RandomForestClassifier(),
    "Bagged Trees": BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10),
    "Decision Tree": DecisionTreeClassifier(),
    "KNN": KNeighborsClassifier(),
    "LDA": LinearDiscriminantAnalysis(),
    "SVM": SVC(),
    "XGBoost": XGBClassifier()
}

results_sensor1 = []
for name, model in models_sensor1.items():
    res = train_and_evaluate_model(model, name, "sensor1", x_train1, y_train, x_test1, y_test)
    results_sensor1.append(res)
    print(f"{name} on sensor1: Accuracy = {res['accuracy']:.2f}%")


# ... for other sensor
# append both sensor data in dict
all_results = {
    "sensor1": results_sensor1,
    "sensor2": results_sensor2
}

```py def train_and_evaluate_model(model, model_name, sensor_label, x_train, y_train, x_test, y_test): model.fit(x_train, y_train) y_pred = model.predict(x_test) accuracy = accuracy_score(y_test, y_pred) * 100 return { "model": model_name, "sensor": sensor_label, "accuracy": accuracy } models_sensor1 = { "Random Forest": RandomForestClassifier(), "Bagged Trees": BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10), "Decision Tree": DecisionTreeClassifier(), "KNN": KNeighborsClassifier(), "LDA": LinearDiscriminantAnalysis(), "SVM": SVC(), "XGBoost": XGBClassifier() } results_sensor1 = [] for name, model in models_sensor1.items(): res = train_and_evaluate_model(model, name, "sensor1", x_train1, y_train, x_test1, y_test) results_sensor1.append(res) print(f"{name} on sensor1: Accuracy = {res['accuracy']:.2f}%") # ... for other sensor # append both sensor data in dict all_results = { "sensor1": results_sensor1, "sensor2": results_sensor2 } ```

Sign in to join this conversation.

Branches Tags

main

dev

feature/chapter-2-literature-review

feature/chapter-4-results

feature/chapter-3-methodology-steps

exp/74-exp-cross-dataset-validation

exp/74-exp-cross-dataset-validation-b2bf1b0

feat/103-feat-inference-function

feature/101-feat-time-elapsed-for-training-and-inference

feature/99-exp-alternative-undamage-case-data

feat/90-feat-preserve-trained-model

latex/75-enhance-background-research

wuicace-2025

revert-92-latex/91-bug-expose-maketitle

latex/91-bug-expose-maketitle

latex/documentclass

latex/frontmatter

latex/bib

latex/methodology

latex/literature-review

latex/theoritical-foundation

latex/background

latex/68-feat-refactor-chapter-two

68-feat-refactor-chapter-two

latex/initial-template

59-feat-add-acknowledgement-page

57-feat-add-dynamic-page-style-for-chapter-page

latex/fix-table-of-contents-styling

56-bug-endorsementpage-error

latex/54-doc-summary-table-of-past-realted-research

feature/48-feat-refactor-stft-preprocessing-and-training-pipeline-into-importable-modules

40-feat-add-export-to-csv-method-for-dataprocessor-in-convertpy

43-bug-stft-csv-export-has-incorrect-shape-and-column-format

feature/38-feat-redesign-convertpy

feature/37-feat-add-data-processing-script-for-dataset-b-outside-training-data

stft

feature/19-qugs-data

feature/15-normalize-dataset-by-preprocess-relatives-value-between-two-acceloremeter-sensors

feature/automate-csv-file

revert-8-feature/csv-padding-naming

feature/5-create-fft-script

feature/10-add-labels-column-to-time-domain-feature-extraction-dataframe

feature/csv-padding-naming

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: nuluh/thesis#88