fix(data): fix the incorrect output of scipy.stft() data to be pandas.DataFrame shaped (513,513) along with its frequencies as the index and times as the columns (transposed) instead of just the magnitude that being flattened out; add checks for empty data and correct file paths for sensor data loading.

Closes #43
feat(data): add export_to_csv method for saving processed data into individuals sensor end and update test script
2025-04-20 14:45:38 +07:00 · 2025-04-17 10:10:19 +07:00 · 2025-03-22 19:57:20 +07:00 · 2025-03-22 19:48:50 +07:00 · 2025-03-21 15:58:50 +07:00 · 2025-03-16 18:45:36 +07:00
12 changed files with 1025 additions and 3497 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,115 @@
+name: Bug Report
+description: Report a bug or unexpected behavior
+title: "[BUG] "
+labels: ["bug"]
+assignees:
+  - ${{github.actor}}
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for taking the time to fill out this bug report!
+        
+  - type: textarea
+    id: description
+    attributes:
+      label: Bug Description
+      description: A clear and concise description of what the bug is
+      placeholder: When I run the script, it crashes when processing large datasets...
+    validations:
+      required: true
+      
+  - type: textarea
+    id: reproduction
+    attributes:
+      label: Steps to Reproduce
+      description: Steps to reproduce the behavior
+      placeholder: |
+        1. Go to notebook '...'
+        2. Run cell #...
+        3. See error
+    validations:
+      required: true
+      
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected Behavior
+      description: What did you expect to happen?
+      placeholder: The analysis should complete successfully and generate the visualization
+    validations:
+      required: true
+      
+  - type: textarea
+    id: actual
+    attributes:
+      label: Actual Behavior
+      description: What actually happened?
+      placeholder: The script crashes with a memory error after processing 1000 samples
+    validations:
+      required: true
+      
+  - type: textarea
+    id: logs
+    attributes:
+      label: Error Logs
+      description: Paste any relevant logs or error messages
+      render: shell
+      placeholder: |
+        Traceback (most recent call last):
+          File "script.py", line 42, in <module>
+            main()
+          File "script.py", line 28, in main
+            process_data(data)
+        MemoryError: ...
+    validations:
+      required: false
+      
+  - type: dropdown
+    id: component
+    attributes:
+      label: Component
+      description: Which part of the thesis project is affected?
+      options:
+        - LaTeX Document
+        - Python Source Code
+        - Jupyter Notebook
+        - Data Processing
+        - ML Model
+        - Visualization
+        - Build/Environment
+    validations:
+      required: true
+      
+  - type: input
+    id: version
+    attributes:
+      label: Version/Commit
+      description: Which version or commit hash are you using?
+      placeholder: v0.2.3 or 8d5b9a7
+    validations:
+      required: true
+      
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: Information about your environment
+      placeholder: |
+        - OS: [e.g. Ubuntu 22.04]
+        - Python: [e.g. 3.9.5]
+        - Relevant packages and versions:
+          - numpy: 1.22.3
+          - scikit-learn: 1.0.2
+          - tensorflow: 2.9.1
+    validations:
+      required: false
+      
+  - type: textarea
+    id: additional
+    attributes:
+      label: Additional Context
+      description: Any other context or screenshots about the problem
+      placeholder: Add any other context about the problem here...
+    validations:
+      required: false
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,12 @@
+blank_issues_enabled: false
+contact_links:
+  - name: Documentation
+    url: ../docs/README.md
+    about: Check the documentation before creating an issue
+
+# Template configurations
+templates:
+  - name: bug_report.yml
+  - name: feature_request.yml
+  - name: experiment.yml
+  - name: documentation.yml
--- a/.github/ISSUE_TEMPLATE/documentation.yml
+++ b/.github/ISSUE_TEMPLATE/documentation.yml
@@ -0,0 +1,116 @@
+name: Documentation
+description: Improvements or additions to documentation
+title: "[DOC] "
+labels: ["documentation"]
+assignees:
+  - ${{github.actor}}
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Use this template for documentation-related tasks for your thesis project.
+        
+  - type: dropdown
+    id: doc_type
+    attributes:
+      label: Documentation Type
+      description: What type of documentation is this issue about?
+      options:
+        - Thesis Chapter/Section
+        - Code Documentation
+        - Experiment Documentation
+        - README/Project Documentation
+        - Literature Review
+        - Methodology Description
+        - Results Analysis
+        - API Reference
+    validations:
+      required: true
+      
+  - type: textarea
+    id: description
+    attributes:
+      label: Description
+      description: Describe what needs to be documented
+      placeholder: Need to document the data preprocessing pipeline including all transformation steps and rationale
+    validations:
+      required: true
+      
+  - type: textarea
+    id: current_state
+    attributes:
+      label: Current State
+      description: What's the current state of the documentation (if any)?
+      placeholder: Currently there are some comments in the code but no comprehensive documentation of the preprocessing steps
+    validations:
+      required: false
+      
+  - type: textarea
+    id: proposed_changes
+    attributes:
+      label: Proposed Changes
+      description: What specific documentation changes do you want to make?
+      placeholder: |
+        1. Create a dedicated markdown file describing each preprocessing step
+        2. Add docstrings to all preprocessing functions
+        3. Create a diagram showing the data flow
+        4. Document parameter choices and their justification
+    validations:
+      required: true
+      
+  - type: input
+    id: location
+    attributes:
+      label: Documentation Location
+      description: Where will this documentation be stored?
+      placeholder: docs/data_preprocessing.md or src/preprocessing/README.md
+    validations:
+      required: true
+      
+  - type: dropdown
+    id: priority
+    attributes:
+      label: Priority
+      description: How important is this documentation?
+      options:
+        - Critical (required for thesis)
+        - High (important for understanding)
+        - Medium (helpful but not urgent)
+        - Low (nice to have)
+    validations:
+      required: true
+      
+  - type: dropdown
+    id: audience
+    attributes:
+      label: Target Audience
+      description: Who is the primary audience for this documentation?
+      options:
+        - Thesis Committee/Reviewers
+        - Future Self
+        - Other Researchers
+        - Technical Readers
+        - Non-technical Readers
+        - Multiple Audiences
+    validations:
+      required: true
+      
+  - type: textarea
+    id: references
+    attributes:
+      label: References
+      description: Any papers, documentation or other materials related to this documentation task
+      placeholder: |
+        - Smith et al. (2022). "Best practices in machine learning documentation"
+        - Code in src/preprocessing/normalize.py
+    validations:
+      required: false
+      
+  - type: textarea
+    id: notes
+    attributes:
+      label: Additional Notes
+      description: Any other relevant information
+      placeholder: This documentation will be referenced in Chapter 3 of the thesis
+    validations:
+      required: false
--- a/.github/ISSUE_TEMPLATE/experiment.yml
+++ b/.github/ISSUE_TEMPLATE/experiment.yml
@@ -0,0 +1,124 @@
+# .github/ISSUE_TEMPLATE/experiment.yml
+name: Experiment
+description: Document a new ML experiment
+title: "[EXP] "
+labels: ["experiment"]
+assignees:
+  - ${{github.actor}}
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Use this template to document a new experiment for your thesis.
+        
+  - type: textarea
+    id: hypothesis
+    attributes:
+      label: Hypothesis
+      description: What is the hypothesis you're testing with this experiment?
+      placeholder: Using a deeper network with residual connections will improve accuracy on the imbalanced dataset without increasing overfitting
+    validations:
+      required: true
+      
+  - type: textarea
+    id: background
+    attributes:
+      label: Background & Motivation
+      description: Background context and why this experiment is important
+      placeholder: Previous experiments showed promising results but suffered from overfitting. Recent literature suggests that...
+    validations:
+      required: true
+      
+  - type: textarea
+    id: dataset
+    attributes:
+      label: Dataset
+      description: What data will you use for this experiment?
+      placeholder: |
+        - Dataset: MNIST with augmentation
+        - Preprocessing: Standardization + random rotation
+        - Train/Test Split: 80/20
+        - Validation strategy: 5-fold cross-validation
+    validations:
+      required: true
+      
+  - type: textarea
+    id: methodology
+    attributes:
+      label: Methodology
+      description: How will you conduct the experiment?
+      placeholder: |
+        1. Implement ResNet architecture with varying depths (18, 34, 50)
+        2. Train with early stopping (patience=10)
+        3. Compare against baseline CNN from experiment #23
+        4. Analyze learning curves and performance metrics
+    validations:
+      required: true
+      
+  - type: textarea
+    id: parameters
+    attributes:
+      label: Parameters & Hyperparameters
+      description: List the key parameters for this experiment
+      placeholder: |
+        - Learning rate: 0.001 with Adam optimizer
+        - Batch size: 64
+        - Epochs: Max 100 with early stopping
+        - Dropout rate: 0.3
+        - L2 regularization: 1e-4
+    validations:
+      required: true
+      
+  - type: textarea
+    id: metrics
+    attributes:
+      label: Evaluation Metrics
+      description: How will you evaluate the results?
+      placeholder: |
+        - Accuracy
+        - F1-score (macro-averaged)
+        - ROC-AUC
+        - Training vs. validation loss curves
+        - Inference time
+    validations:
+      required: true
+      
+  - type: input
+    id: notebook
+    attributes:
+      label: Notebook Location
+      description: Where will the experiment notebook be stored?
+      placeholder: notebooks/experiment_resnet_comparison.ipynb
+    validations:
+      required: false
+      
+  - type: textarea
+    id: dependencies
+    attributes:
+      label: Dependencies
+      description: What other issues or tasks does this experiment depend on?
+      placeholder: |
+        - Depends on issue #42 (Data preprocessing pipeline)
+        - Requires completion of issue #51 (Baseline model)
+    validations:
+      required: false
+      
+  - type: textarea
+    id: references
+    attributes:
+      label: References
+      description: Any papers, documentation or other materials relevant to this experiment
+      placeholder: |
+        - He et al. (2016). "Deep Residual Learning for Image Recognition"
+        - My previous experiment #23 (baseline CNN)
+    validations:
+      required: false
+      
+  - type: textarea
+    id: notes
+    attributes:
+      label: Additional Notes
+      description: Any other relevant information
+      placeholder: This experiment may require significant GPU resources. Expected runtime is ~3 hours on Tesla V100.
+    validations:
+      required: false
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -0,0 +1,99 @@
+# .github/ISSUE_TEMPLATE/feature_request.yml
+name: Feature Request
+description: Suggest a new feature or enhancement
+title: "[FEAT] "
+labels: ["enhancement"]
+assignees:
+  - ${{github.actor}}
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for taking the time to propose a new feature!
+        
+  - type: textarea
+    id: problem
+    attributes:
+      label: Problem Statement
+      description: What problem are you trying to solve with this feature?
+      placeholder: I'm frustrated when trying to analyze different model results because I need to manually compare them...
+    validations:
+      required: true
+      
+  - type: textarea
+    id: solution
+    attributes:
+      label: Proposed Solution
+      description: Describe the solution you'd like to implement
+      placeholder: Create a visualization utility that automatically compares results across multiple models and experiments
+    validations:
+      required: true
+      
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives Considered
+      description: Describe alternatives you've considered
+      placeholder: I considered using an external tool, but integrating directly would provide better workflow
+    validations:
+      required: false
+      
+  - type: dropdown
+    id: component
+    attributes:
+      label: Component
+      description: Which part of the thesis project would this feature affect?
+      options:
+        - LaTeX Document
+        - Python Source Code
+        - Jupyter Notebook
+        - Data Processing
+        - ML Model
+        - Visualization
+        - Build/Environment
+        - Multiple Components
+    validations:
+      required: true
+      
+  - type: dropdown
+    id: priority
+    attributes:
+      label: Priority
+      description: How important is this feature for your thesis progression?
+      options:
+        - Critical (blocks progress)
+        - High (significantly improves workflow)
+        - Medium (nice to have)
+        - Low (minor improvement)
+    validations:
+      required: true
+      
+  - type: textarea
+    id: implementation
+    attributes:
+      label: Implementation Ideas
+      description: Any initial thoughts on how to implement this feature?
+      placeholder: |
+        - Could use matplotlib's subplot feature
+        - Would need to standardize the model output format
+        - Should include statistical significance tests
+    validations:
+      required: false
+      
+  - type: textarea
+    id: benefits
+    attributes:
+      label: Expected Benefits
+      description: How will this feature benefit your thesis work?
+      placeholder: This will save time in analysis and provide more consistent comparisons across experiments
+    validations:
+      required: true
+      
+  - type: textarea
+    id: additional
+    attributes:
+      label: Additional Context
+      description: Any other context, screenshots, or reference material
+      placeholder: Here's a paper that uses a similar approach...
+    validations:
+      required: false
--- a/7
+++ b/7
@@ -0,0 +1,7 @@
+Copyright 2024 Rifqi D. Panuluh
+
+All Rights Reserved.
+
+This repository is for viewing purposes only. No part of this repository, including but not limited to the code, files, and documentation, may be copied, reproduced, modified, or distributed in any form or by any means without the prior written permission of the copyright holder.
+
+Unauthorized use, distribution, or modification of this repository may result in legal action.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,18 @@
+## Summary
+
+This repository contains the work related to my thesis, which focuses on damage localization prediction. The research explores the application of machine learning techniques to structural health monitoring.
+
+**Note:** This repository does not contain the secondary data used in the analysis. The code is designed to work with data from the [QUGS (Qatar University Grandstand Simulator)](https://www.structuralvibration.com/benchmark/qugs/) dataset, which is not included here.
+
+The repository is private and access is restricted only to those who have been given explicit permission by the owner. Access is provided solely for the purpose of brief review or seeking technical guidance.
+
+## Restrictions
+
+- **No Derivative Works or Cloning:** Any form of copying, cloning, or creating derivative works based on this repository is strictly prohibited.
+- **Limited Access:** Use beyond brief review or collaboration is not allowed without prior permission from the owner.
+
+---
+
+All contents of this repository, including the thesis idea, code, and associated data, are copyrighted © 2024 by Rifqi Panuluh. Unauthorized use or duplication is prohibited.
+
+[LICENSE](https://github.com/nuluh/thesis?tab=License-1-ov-file#readme)
--- a/code/notebooks/03_feature_extraction.ipynb
+++ b/code/notebooks/03_feature_extraction.ipynb
--- a/code/notebooks/stft.ipynb
+++ b/code/notebooks/stft.ipynb
--- a/data/QUGS/convert.py
+++ b/data/QUGS/convert.py
@@ -1,25 +1,307 @@
 import pandas as pd
 import os
+import re
 import sys
+import numpy as np
 from colorama import Fore, Style, init
+from typing import TypedDict, Dict, List
+from joblib import load
+from pprint import pprint
+
+# class DamageFilesIndices(TypedDict):
+#     damage_index: int
+#     files: list[int]
+OriginalSingleDamageScenarioFilePath = str
+DamageScenarioGroupIndex = int
+OriginalSingleDamageScenario = pd.DataFrame
+SensorIndex = int
+VectorColumnIndex = List[SensorIndex]
+VectorColumnIndices = List[VectorColumnIndex]
+DamageScenarioGroup = List[OriginalSingleDamageScenario]
+GroupDataset = List[DamageScenarioGroup]
+
+
+class DamageFilesIndices(TypedDict):
+    damage_index: int
+    files: List[str]
+
+
+def generate_damage_files_index(**kwargs) -> DamageFilesIndices:
+    prefix: str = kwargs.get("prefix", "zzzAD")
+    extension: str = kwargs.get("extension", ".TXT")
+    num_damage: int = kwargs.get("num_damage")
+    file_index_start: int = kwargs.get("file_index_start")
+    col: int = kwargs.get("col")
+    base_path: str = kwargs.get("base_path")
+
+    damage_scenarios = {}
+    a = file_index_start
+    b = col + 1
+    for i in range(1, num_damage + 1):
+        damage_scenarios[i] = range(a, b)
+        a += col
+        b += col
+
+    # return damage_scenarios
+
+    x = {}
+    for damage, files in damage_scenarios.items():
+        x[damage] = []  # Initialize each key with an empty list
+        for i, file_index in enumerate(files, start=1):
+            if base_path:
+                x[damage].append(
+                    os.path.normpath(
+                        os.path.join(base_path, f"{prefix}{file_index}{extension}")
+                    )
+                )
+                # if not os.path.exists(file_path):
+                #     print(Fore.RED + f"File {file_path} does not exist.")
+                #     continue
+            else:
+                x[damage].append(f"{prefix}{file_index}{extension}")
+    return x
+
+    # file_path = os.path.join(base_path, f"zzz{prefix}D{file_index}.TXT")
+    # df = pd.read_csv( file_path, sep="\t", skiprows=10)  # Read with explicit column names
+
+
+class DataProcessor:
+    def __init__(self, file_index: DamageFilesIndices, cache_path: str = None):
+        self.file_index = file_index
+        if cache_path:
+            self.data = load(cache_path)
+        else:
+            self.data = self._load_all_data()
+
+    def _extract_column_names(self, file_path: str) -> List[str]:
+        """
+        Extracts column names from the header of the given file.
+        Assumes the 6th line contains column names.
+
+        :param file_path: Path to the data file.
+        :return: List of column names.
+        """
+        with open(file_path, "r") as f:
+            header_lines = [next(f) for _ in range(12)]
+
+        # Extract column names from the 6th line
+        channel_line = header_lines[10].strip()
+        tokens = re.findall(r'"([^"]+)"', channel_line)
+        if not channel_line.startswith('"'):
+            first_token = channel_line.split()[0]
+            tokens = [first_token] + tokens
+
+        return tokens  # Prepend 'Time' column if applicable
+
+    def _load_dataframe(self, file_path: str) -> OriginalSingleDamageScenario:
+        """
+        Loads a single data file into a pandas DataFrame.
+
+        :param file_path: Path to the data file.
+        :return: DataFrame containing the numerical data.
+        """
+        col_names = self._extract_column_names(file_path)
+        df = pd.read_csv(
+            file_path, delim_whitespace=True, skiprows=11, header=None, memory_map=True
+        )
+        df.columns = col_names
+        return df
+
+    def _load_all_data(self) -> GroupDataset:
+        """
+        Loads all data files based on the grouping dictionary and returns a nested list.
+
+        :return: A nested list of DataFrames where the outer index corresponds to group_idx - 1.
+        """
+        data = []
+        # Find the maximum group index to determine the list size
+        max_group_idx = max(self.file_index.keys()) if self.file_index else 0
+
+        # Initialize empty lists
+        for _ in range(max_group_idx):
+            data.append([])
+
+        # Fill the list with data
+        for group_idx, file_list in self.file_index.items():
+            # Adjust index to be 0-based
+            list_idx = group_idx - 1
+            data[list_idx] = [self._load_dataframe(file) for file in file_list]
+
+        return data
+
+    def get_group_data(self, group_idx: int) -> List[pd.DataFrame]:
+        """
+        Returns the list of DataFrames for the given group index.
+
+        :param group_idx: Index of the group.
+        :return: List of DataFrames.
+        """
+        return self.data.get([group_idx, []])
+
+    def get_column_names(self, group_idx: int, file_idx: int = 0) -> List[str]:
+        """
+        Returns the column names for the given group and file indices.
+
+        :param group_idx: Index of the group.
+        :param file_idx: Index of the file in the group.
+        :return: List of column names.
+        """
+        if group_idx in self.data and len(self.data[group_idx]) > file_idx:
+            return self.data[group_idx][file_idx].columns.tolist()
+        return []
+
+    def get_data_info(self):
+        """
+        Print information about the loaded data structure.
+        Adapted for when self.data is a List instead of a Dictionary.
+        """
+        if isinstance(self.data, list):
+            # For each sublist in self.data, get the type names of all elements
+            pprint(
+                [
+                    (
+                        [type(item).__name__ for item in sublist]
+                        if isinstance(sublist, list)
+                        else type(sublist).__name__
+                    )
+                    for sublist in self.data
+                ]
+            )
+        else:
+            pprint(
+                {
+                    key: [type(df).__name__ for df in value]
+                    for key, value in self.data.items()
+                }
+                if isinstance(self.data, dict)
+                else type(self.data).__name__
+            )
+
+    def _create_vector_column_index(self) -> VectorColumnIndices:
+        vector_col_idx: VectorColumnIndices = []
+        y = 0
+        for data_group in self.data:  # len(data_group[i]) = 5
+            for j in data_group:  # len(j[i]) =
+                c: VectorColumnIndex = []  # column vector c_{j}
+                x = 0
+                for _ in range(6):  # TODO: range(6) should be dynamic and parameterized
+                    c.append(x + y)
+                    x += 5
+                vector_col_idx.append(c)
+                y += 1
+            return vector_col_idx
+
+    def create_vector_column(self, overwrite=True) -> List[List[List[pd.DataFrame]]]:
+        """
+        Create a vector column from the loaded data.
+
+        :param overwrite: Overwrite the original data with vector column-based data.
+        """
+        idx = self._create_vector_column_index()
+        # if overwrite:
+        for i in range(len(self.data)):
+            for j in range(len(self.data[i])):
+                # Get the appropriate indices for slicing from idx
+                indices = idx[j]
+
+                # Get the current DataFrame
+                df = self.data[i][j]
+
+                # Keep the 'Time' column and select only specified 'Real' columns
+                # First, we add 1 to all indices to account for 'Time' being at position 0
+                real_indices = [index + 1 for index in indices]
+
+                # Create list with Time column index (0) and the adjusted Real indices
+                all_indices = [0] + real_indices
+
+                # Apply the slicing
+                self.data[i][j] = df.iloc[:, all_indices]
+        # TODO: if !overwrite:
+
+    def create_limited_sensor_vector_column(self, overwrite=True):
+        """
+        Create a vector column from the loaded data.
+
+        :param overwrite: Overwrite the original data with vector column-based data.
+        """
+        idx = self._create_vector_column_index()
+        # if overwrite:
+        for i in range(len(self.data)):  # damage(s)
+            for j in range(len(self.data[i])):  # col(s)
+                # Get the appropriate indices for slicing from idx
+                indices = idx[j]
+
+                # Get the current DataFrame
+                df = self.data[i][j]
+
+                # Keep the 'Time' column and select only specifid 'Real' colmns
+                # First, we add 1 to all indices to acount for 'Time' being at positiion 0
+                real_indices = [index + 1 for index in indices]
+
+                # Create list with Time column index (0) and the adjustedd Real indices
+                all_indices = [0] + [real_indices[0]] + [real_indices[-1]]
+
+                # Apply the slicing
+                self.data[i][j] = df.iloc[:, all_indices]
+        # TODO: if !overwrite:
+
+    def export_to_csv(self, output_dir: str, file_prefix: str = "DAMAGE"):
+        """
+        Export the processed data to CSV files in the required folder structure.
+
+        :param output_dir: Directory to save the CSV files.
+        :param file_prefix: Prefix for the output filenames.
+        """
+        for group_idx, group in enumerate(self.data, start=1):
+            group_folder = os.path.join(output_dir, f"{file_prefix}_{group_idx}")
+            os.makedirs(group_folder, exist_ok=True)
+            for test_idx, df in enumerate(group, start=1):
+                # Ensure columns are named uniquely if duplicated
+                df = df.copy()
+                df.columns = ["Time", "Real_0", "Real_1"]  # Rename
+
+                # Export first Real column
+                out1 = os.path.join(
+                    group_folder, f"{file_prefix}_{group_idx}_TEST{test_idx}_01.csv"
+                )
+                df[["Time", "Real_0"]].rename(columns={"Real_0": "Real"}).to_csv(
+                    out1, index=False
+                )
+
+                # Export last Real column
+                out2 = os.path.join(
+                    group_folder, f"{file_prefix}_{group_idx}_TEST{test_idx}_02.csv"
+                )
+                df[["Time", "Real_1"]].rename(columns={"Real_1": "Real"}).to_csv(
+                    out2, index=False
+                )
+

 def create_damage_files(base_path, output_base, prefix):
    # Initialize colorama
    init(autoreset=True)
-    
-    # Generate column labels based on expected duplication in input files
-    columns = ['Real'] + [f'Real.{i}' for i in range(1, 30)]  # Explicitly setting column names

-    sensor_end_map = {1: 'Real.25', 2: 'Real.26', 3: 'Real.27', 4: 'Real.28', 5: 'Real.29'}
+    # Generate column labels based on expected duplication in input files
+    columns = ["Real"] + [
+        f"Real.{i}" for i in range(1, 30)
+    ]  # Explicitly setting column names
+
+    sensor_end_map = {
+        1: "Real.25",
+        2: "Real.26",
+        3: "Real.27",
+        4: "Real.28",
+        5: "Real.29",
+    }

    # Define the damage scenarios and the corresponding original file indices
    damage_scenarios = {
        1: range(1, 6),  # Damage 1 files from zzzAD1.csv to zzzAD5.csv
        2: range(6, 11),  # Damage 2 files from zzzAD6.csv to zzzAD10.csv
-        3: range(11, 16), # Damage 3 files from zzzAD11.csv to zzzAD15.csvs
-        4: range(16, 21), # Damage 4 files from zzzAD16.csv to zzzAD20.csv
+        3: range(11, 16),  # Damage 3 files from zzzAD11.csv to zzzAD15.csvs
+        4: range(16, 21),  # Damage 4 files from zzzAD16.csv to zzzAD20.csv
        5: range(21, 26),  # Damage 5 files from zzzAD21.csv to zzzAD25.csv
-        6: range(26, 31)  # Damage 6 files from zzzAD26.csv to zzzAD30.csv
+        6: range(26, 31),  # Damage 6 files from zzzAD26.csv to zzzAD30.csv
    }
    damage_pad = len(str(len(damage_scenarios)))
    test_pad = len(str(30))
@@ -27,27 +309,36 @@ def create_damage_files(base_path, output_base, prefix):
    for damage, files in damage_scenarios.items():
        for i, file_index in enumerate(files, start=1):
            # Load original data file
-            file_path = os.path.join(base_path, f'zzz{prefix}D{file_index}.TXT')
-            df = pd.read_csv(file_path, sep='\t', skiprows=10)  # Read with explicit column names
+            file_path = os.path.join(base_path, f"zzz{prefix}D{file_index}.TXT")
+            df = pd.read_csv(
+                file_path, sep="\t", skiprows=10
+            )  # Read with explicit column names

-            top_sensor = columns[i-1]
+            top_sensor = columns[i - 1]
            print(top_sensor, type(top_sensor))
-            output_file_1 = os.path.join(output_base, f'DAMAGE_{damage}', f'DAMAGE{damage}_TEST{i}_01.csv')
+            output_file_1 = os.path.join(
+                output_base, f"DAMAGE_{damage}", f"DAMAGE{damage}_TEST{i}_01.csv"
+            )
            print(f"Creating {output_file_1} from taking zzz{prefix}D{file_index}.TXT")
            print("Taking datetime column on index 0...")
            print(f"Taking `{top_sensor}`...")
-            df[['Time', top_sensor]].to_csv(output_file_1, index=False)
+            os.makedirs(os.path.dirname(output_file_1), exist_ok=True)
+            df[["Time", top_sensor]].to_csv(output_file_1, index=False)
            print(Fore.GREEN + "Done")

            bottom_sensor = sensor_end_map[i]
-            output_file_2 = os.path.join(output_base, f'DAMAGE_{damage}', f'DAMAGE{damage}_TEST{i}_02.csv')
+            output_file_2 = os.path.join(
+                output_base, f"DAMAGE_{damage}", f"DAMAGE{damage}_TEST{i}_02.csv"
+            )
            print(f"Creating {output_file_2} from taking zzz{prefix}D{file_index}.TXT")
            print("Taking datetime column on index 0...")
            print(f"Taking `{bottom_sensor}`...")
-            df[['Time', bottom_sensor]].to_csv(output_file_2, index=False)
+            os.makedirs(os.path.dirname(output_file_2), exist_ok=True)
+            df[["Time", bottom_sensor]].to_csv(output_file_2, index=False)
            print(Fore.GREEN + "Done")
            print("---")

+
 def main():
    if len(sys.argv) < 2:
        print("Usage: python convert.py <path_to_csv_files>")
@@ -58,11 +349,12 @@ def main():
    prefix = sys.argv[3]  # Define output directory

    # Create output folders if they don't exist
-    for i in range(1, 5):
-        os.makedirs(os.path.join(output_base, f'DAMAGE_{i}'), exist_ok=True)
+    # for i in range(1, 7):
+    #     os.makedirs(os.path.join(output_base, f'DAMAGE_{i}'), exist_ok=True)

    create_damage_files(base_path, output_base, prefix)
    print(Fore.YELLOW + Style.BRIGHT + "All files have been created successfully.")

+
 if __name__ == "__main__":
    main()
--- a/data/QUGS/test.py
+++ b/data/QUGS/test.py
@@ -0,0 +1,12 @@
+from convert import *
+from joblib import dump, load
+
+a = generate_damage_files_index(
+    num_damage=6, file_index_start=1, col=5, base_path="D:/thesis/data/dataset_A"
+)
+data = DataProcessor(file_index=a)
+# data.create_vector_column(overwrite=True)
+data.create_limited_sensor_vector_column(overwrite=True)
+data.export_to_csv("D:/thesis/data/")
+# a = load("D:/cache.joblib")
+breakpoint()
--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -0,0 +1,66 @@
+This document outlines the process for developing and contributing to my own thesis project. By following these guidelines, this will ensure consistent quality and maintain a clear development history.
+
+## Development Workflow
+
+### 1. Issue Creation
+Before working on any new feature, experiment, or bug fix:
+- Create a GitHub issue using the appropriate template
+- Assign it to myself
+- Add relevant labels
+- Link it to the project board if applicable
+
+### 2. Branching Strategy
+Use the following branch naming convention:
+- `feature/<issue-number>-short-description`
+- `bugfix/<issue-number>-short-description`
+- `experiment/<issue-number>-short-description`
+- `doc/<issue-number>-short-description`
+
+Always branch from `main` for new features/experiments.
+
+### 3. Development Process
+- Make regular, atomic commits following the commit message template
+- Include the issue number in commit messages (e.g., "#42")
+- Push changes at the end of each work session
+
+### 4. Code Quality
+- Follow PEP 8 guidelines for Python code
+- Document functions with docstrings
+- Maintain test coverage for custom functions
+- Keep notebooks clean and well-documented
+
+### 5. Pull Requests
+Even working alone, use PRs for significant changes:
+- Create a PR from your feature branch to `main`
+- Reference the issue(s) it resolves
+- Include a summary of changes
+- Self-review the PR before merging
+
+### 6. Versioning
+Follow semantic versioning:
+- Major version: Significant thesis milestones or structural changes
+- Minor version: New experiments, features, or chapters
+- Patch version: Bug fixes and minor improvements
+
+### 7. Documentation
+Update documentation with each significant change:
+- Keep README current
+- Update function documentation
+- Maintain clear experiment descriptions in notebooks
+- Record significant decisions and findings
+
+## LaTeX Guidelines
+- Use consistent citation style
+- Break long sections into multiple files
+- Use meaningful label names for cross-references
+- Consider using version-control friendly LaTeX practices (one sentence per line)
+
+## Experiment Tracking
+For each experiment:
+- Create an issue documenting the experiment design
+- Reference related papers and previous experiments
+- Document parameters and results in the notebook
+- Summarize findings in the issue before closing
+
+## Commit Categories
+Use the categories defined in the commit template to clearly classify changes.
Author	SHA1	Message	Date
nuluh	db2947abdf	fix(data): fix the incorrect output of scipy.stft() data to be pandas.DataFrame shaped (513,513) along with its frequencies as the index and times as the columns (transposed) instead of just the magnitude that being flattened out; add checks for empty data and correct file paths for sensor data loading. Closes #43	2025-04-20 14:45:38 +07:00
nuluh	36b36c41ba	feat(data): add export_to_csv method for saving processed data into individuals sensor end and update test script Closes #40	2025-04-17 10:10:19 +07:00
Rifqi D. Panuluh	28681017ad	Merge pull request #39 from nuluh/feature/38-feat-redesign-convertpy Feature/38 feat redesign `convert.py`	2025-03-22 19:57:20 +07:00
nuluh	ff64f3a3ab	refactor(data): update type annotations for damage files index and related classes. Need better implementation	2025-03-22 19:48:50 +07:00
nuluh	58a316d9c8	feat(data): implement damage files index generation and data processing Closes #38	2025-03-21 15:58:50 +07:00
Rifqi D. Panuluh	020028eed8	Merge pull request #36 from nuluh/35-bug-oserror-for-non-existent-folder-when-running-convertpy fix(data): ensure output directories are created before saving files	2025-03-16 18:45:36 +07:00
nuluh	35e25ba4c6	fix(data): ensure output directories are created before saving files Closes #35	2025-03-16 18:30:52 +07:00
nuluh	8ed1437d6d	Merge branch 'main' of https://github.com/nuluh/thesis	2025-03-16 14:12:11 +07:00
nuluh	96556a1186	``` No code changes detected. ```	2025-03-16 14:07:56 +07:00
Rifqi D. Panuluh	0a63aab211	Merge pull request #34 from nuluh/stft fix(data): update output file naming to include customizable prefix	2025-03-16 14:01:58 +07:00
Rifqi D. Panuluh	48ea879863	Create documentation.yml	2025-03-16 12:39:10 +07:00
Rifqi D. Panuluh	69afdb1ad1	Create experiment.yml	2025-03-16 12:38:41 +07:00
Rifqi D. Panuluh	db2d9299e6	Create feature_request.yml	2025-03-16 12:38:14 +07:00
Rifqi D. Panuluh	d5ba1ac0cd	Create bug_report.yml	2025-03-16 12:37:38 +07:00
Rifqi D. Panuluh	144f406226	Create config.yml	2025-03-16 12:36:46 +07:00
Rifqi D. Panuluh	e6f8820989	Merge pull request #33 from nuluh/stft chore: add .gitattributes and .gitmessage for commit message guidelines	2025-03-16 12:17:55 +07:00
Rifqi D. Panuluh	2de04a6ea6	Create CONTRIBUTING.md	2025-03-16 12:02:35 +07:00
Rifqi D. Panuluh	48075a590e	Merge pull request #32 from nuluh/stft Stft	2025-03-16 11:49:44 +07:00
Rifqi D. Panuluh	b229967e05	Update README.md	2024-09-09 23:14:01 +07:00
Rifqi D. Panuluh	f41edaaa91	Update README.md	2024-09-09 23:13:03 +07:00
Panuluh	e9c06e1ac1	Update LICENSE	2024-09-07 09:13:57 +07:00