milestone.md



Milestone

Read Articles and Sources

Were RNNs All We Needed?
Coding a Recurrent Neural Network (RNN) from scratch using Pytorch
PyTorch RNN Class Documentation
MAPE


Brief Assignment Summary
Project focuses on predicting parameters of sinusoidal waves using neural networks with minGRU architecture. The core task is to predict amplitude (A) and frequency (ω) parameters from discretized sine wave samples. The waves are defined as y(t) = A sin(ωt) where t ∈ [0, 2π], sampled at 100 points.

Current Progress

Implemented dataset generation of sinusoidal waves
Implemented a multi-layer minGRU architecture that allows parallel computation of hidden states in logspace
Trained minGRU with various levels of network depth - using AdamW optimizer and MSE criterion
So far I only experimented with various levels of network depth, experiments with the amount of neurons in hidden layers was also done, however I did not unfortuntely save the results.


Results

Training Configuration
The model was trained with the following hyperparameters:

Dataset size (N): 1000
Maximum amplitude (MAX_AMP): 10
Maximum frequency (MAX_FREQ): 10
Points per sequence (N_POINTS): 100
Batch size: 16
Learning rate: 0.001
Number of epochs: 100
Hidden size: 32
Noise standard deviation: 0
Train/val split: 0.8 / 0.2

The architecture was so far evaluated with varying network depths:

1 layer
4 layers
8 layers


Current Results
The model was evaluated with three different network depths (1, 4, and 8 layers), with all other hyperparameters held constant. Mean absolute percentage error (MAPE) was used as the evaluation metric (lower is better). Best model selection is based on the arithmetic mean of amplitude and frequency MAPE on the validation set.
1 Layer:

Training Loss: 23.70
Validation Loss: 15.20
Amplitude MAPE: Train 1.01% / Val 0.86%
Frequency MAPE: Train 1.16% / Val 2.81%

4 Layer:

Training Loss: 0.17
Validation Loss: 0.10
Amplitude MAPE: Train 0.10% / Val 0.09%
Frequency MAPE: Train 0.25% / Val 0.09%

8 Layer:

Training Loss: 0.17
Validation Loss: 0.08
Amplitude MAPE: Train 0.40% / Val 0.05%
Frequency MAPE: Train 2.55% / Val 0.06%

The 4-layer model achieved the best overall perforamnce, followed by the 8-layer model, where the model has subsantially higher frequency MAPE than the 4-layer model. Both significantly outperformed the single-layer model (1.84% average MAPE). The deeper architectures demonstrated substantially better parameter prediction accuracy.
Project Repository: [Link to repository]
Next steps include:

Hyperparameter experimentation optimization
Noise testing
Comparison with other models? (Regular GRU, LSTM, RNN)