[FEAT] Implement file status tracking to prevent processing incomplete CSVs #46
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem Statement
During STFT processing, errors can occur when processing incomplete or corrupted CSV files, particularly when a previous process was interrupted. There's currently no mechanism to track file completion status, which can lead to shape mismatches when attempting operations like pandas.DataFrame.to_csv(mode='a') on partially processed files.
Proposed Solution
Implement a file status tracking system that:
.csv.temp).csv) only after successful completionAlternatives Considered
While exception handling could catch errors during processing, it's a reactive approach that doesn't prevent the initial processing attempt on incomplete files. Locking mechanisms could also be used but add complexity that may not be necessary for a single-user thesis project.
Component
Python Source Code
Priority
Medium (nice to have)
Implementation Ideas
Create a file status manager class that handles:
Implementation approach:
Example pseudocode:
Expected Benefits
Additional Context
This feature would be particularly useful during long batch processing operations where interruptions are more likely. It would complement the memory optimization feature (issue #45) by adding another layer of robustness to the processing pipeline.
The implementation should be lightweight and not add significant overhead to the processing time. The focus should be on preventing data corruption and providing clear status indicators.
This feature request outlines a practical approach to preventing errors when processing incomplete files. It's a simple safeguard mechanism that can save time and frustration by making the data processing pipeline more resilient to interruptions.