thesis/latex/chapters/id/03_methodology/steps/preprocessing/data_augmentation.tex

We now introduce a simple “data‐augmentation” logic across repeated tests as:
\[
\mathbf{c}_{j}^{(i)}
\;=\;
\Bigl[S_{0+j}^{(i)},\,S_{5+j}^{(i)},\,S_{10+j}^{(i)},\,S_{15+j}^{(i)},\,S_{20+j}^{(i)},\,S_{25+j}^{(i)}\Bigr]^{T}
\;\in\mathbb{R}^{6}\!,
\]
where \(S_{k}^{(i)}\) is the \(k\)th sensor’s time‐frequency feature vector (after STFT+log‐scaling) from the \(i\)-th replicate of scenario \(j\).

For each fixed scenario \(j\), collect the five replicates into the set
\[
\mathcal{D}^{(j)}
=\bigl\{\mathbf{c}_{j}^{(1)},\,\mathbf{c}_{j}^{(2)},\,\mathbf{c}_{j}^{(3)},\,\mathbf{c}_{j}^{(4)},\,\mathbf{c}_{j}^{(5)}\bigr\},
\]
so \(|\mathcal{D}^{(j)}|=5\).  Across all six scenarios, the total augmented dataset is
\[
\mathcal{D}
=\bigcup_{j=0}^{5}\mathcal{D}^{(j)}
=\bigl\{\mathbf{c}_{j}^{(i)}: j=0,\dots,5,\;i=1,\dots,5\bigr\},
\]
with \(\lvert\mathcal{D}\rvert = 6 \times 5 = 30\) samples.

Each \(\mathbf{c}_{j}^{(i)}\) hence represents one ``column‐based’’ damage sample,
and the collection \(\mathcal{D}\) serves as the input set for subsequent classification.