[FEAT] Refactor Training Cell #88

Closed
opened 2025-05-29 14:43:35 +00:00 by nuluh · 1 comment
nuluh commented 2025-05-29 14:43:35 +00:00 (Migrated from github.com)

Problem Statement

The training and evaluation code in the notebook lacks modularity, as evidenced in cell 43 of stft.ipynb. This repetition of logic for sensor processing makes the workflow cumbersome and manual comparison inefficient.

Proposed Solution

  1. Create a helper function (e.g., train_and_evaluate_model) that:
    • Receives a model instance along with training and test data.
    • Times the .fit() call.
    • Evaluates accuracy.
    • Returns a dictionary with keys like "model", "accuracy", and "training_time"
  2. Store your models in a dictionary or list so you can loop over them, reducing duplicated code.
  3. Finally, output the results (a list of dictionaries) which can easily be converted to JSON if needed.

Alternatives Considered

I considered leaving the code as-is or writing separate functions for each sensor. However, a single refactored function that accepts the sensor label as a parameter simplifies the code and encourages reusability.

Component

  • Python Source Code
  • ML Model

Priority

  • High (significantly improves workflow)

Implementation Ideas

  • Create a helper function train_and_evaluate_model that encapsulates model fitting, prediction, and accuracy calculation.
  • Ensure the function receives a sensor label (e.g., "sensor1" or "sensor2") and outputs a dictionary with keys like model, sensor, and accuracy.
  • Loop over a dictionary of models for each sensor, call the helper function, and collect the output in a JSON-like structure for easy comparison and visualization.

Expected Benefits

  • Modularity: The helper function train_and_evaluate_model encapsulates timing, training, and evaluation.
  • DRY Principle: Using dictionaries to loop over model definitions avoids repeating code.
  • JSON-like Output: The results are stored in dictionaries (and then nested in a larger dictionary) so the output is easy to export to JSON and use for plotting later.
  • Sensor Index: The sensor is explicitly noted via the sensor label (e.g., "sensor1" or "sensor2") in the output, clearly associate a given model's performance with a specific sensor.

Additional Context

### Problem Statement The training and evaluation code in the notebook lacks modularity, as evidenced in cell 43 of [stft.ipynb](https://github.com/nuluh/thesis/blob/main/code/notebooks/stft.ipynb). This repetition of logic for sensor processing makes the workflow cumbersome and manual comparison inefficient. ### Proposed Solution 1. Create a helper function (e.g., train_and_evaluate_model) that: - Receives a model instance along with training and test data. - Times the .fit() call. - Evaluates accuracy. - Returns a dictionary with keys like "model", "accuracy", and "training_time" 2. Store your models in a dictionary or list so you can loop over them, reducing duplicated code. 3. Finally, output the results (a list of dictionaries) which can easily be converted to JSON if needed. ### Alternatives Considered I considered leaving the code as-is or writing separate functions for each sensor. However, a single refactored function that accepts the sensor label as a parameter simplifies the code and encourages reusability. ### Component - Python Source Code - ML Model ### Priority - High (significantly improves workflow) ### Implementation Ideas - Create a helper function `train_and_evaluate_model` that encapsulates model fitting, prediction, and accuracy calculation. - Ensure the function receives a sensor label (e.g., `"sensor1"` or `"sensor2"`) and outputs a dictionary with keys like `model`, `sensor`, and `accuracy`. - Loop over a dictionary of models for each sensor, call the helper function, and collect the output in a JSON-like structure for easy comparison and visualization. ### Expected Benefits - **Modularity:** The helper function `train_and_evaluate_model` encapsulates timing, training, and evaluation. - **DRY Principle:** Using dictionaries to loop over model definitions avoids repeating code. - **JSON-like Output:** The results are stored in dictionaries (and then nested in a larger dictionary) so the output is easy to export to JSON and use for plotting later. - **Sensor Index:** The sensor is explicitly noted via the sensor label (e.g., `"sensor1"` or `"sensor2"`) in the output, clearly associate a given model's performance with a specific sensor. ### Additional Context -
nuluh commented 2025-05-29 14:54:20 +00:00 (Migrated from github.com)
def train_and_evaluate_model(model, model_name, sensor_label, x_train, y_train, x_test, y_test):
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    accuracy = accuracy_score(y_test, y_pred) * 100
    return {
        "model": model_name,
        "sensor": sensor_label,
        "accuracy": accuracy
    }

models_sensor1 = {
    "Random Forest": RandomForestClassifier(),
    "Bagged Trees": BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10),
    "Decision Tree": DecisionTreeClassifier(),
    "KNN": KNeighborsClassifier(),
    "LDA": LinearDiscriminantAnalysis(),
    "SVM": SVC(),
    "XGBoost": XGBClassifier()
}

results_sensor1 = []
for name, model in models_sensor1.items():
    res = train_and_evaluate_model(model, name, "sensor1", x_train1, y_train, x_test1, y_test)
    results_sensor1.append(res)
    print(f"{name} on sensor1: Accuracy = {res['accuracy']:.2f}%")


# ... for other sensor
# append both sensor data in dict
all_results = {
    "sensor1": results_sensor1,
    "sensor2": results_sensor2
}
```py def train_and_evaluate_model(model, model_name, sensor_label, x_train, y_train, x_test, y_test): model.fit(x_train, y_train) y_pred = model.predict(x_test) accuracy = accuracy_score(y_test, y_pred) * 100 return { "model": model_name, "sensor": sensor_label, "accuracy": accuracy } models_sensor1 = { "Random Forest": RandomForestClassifier(), "Bagged Trees": BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10), "Decision Tree": DecisionTreeClassifier(), "KNN": KNeighborsClassifier(), "LDA": LinearDiscriminantAnalysis(), "SVM": SVC(), "XGBoost": XGBClassifier() } results_sensor1 = [] for name, model in models_sensor1.items(): res = train_and_evaluate_model(model, name, "sensor1", x_train1, y_train, x_test1, y_test) results_sensor1.append(res) print(f"{name} on sensor1: Accuracy = {res['accuracy']:.2f}%") # ... for other sensor # append both sensor data in dict all_results = { "sensor1": results_sensor1, "sensor2": results_sensor2 } ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nuluh/thesis#88