Homework 02
Hello, I have reviewed your homework, and I like it.
My only comment is about handling missing values when using median. (-0.5 p)
- This should be done after
train_test_split
with medians computed from the train set, to prevent data leakage from validation to train set. - You compute
age
median from the evaluation set as well. While this is not invalid, this information can be different than the one from the training data. It may confuse your model and lower your accuracy on the evaluation set.
For now, I am giving you 9.5 points.
If you are interested, you can improve your homework and correct the abovementioned comment.
- You are training the model on cca. 56% of the original training data. While you are correctly working with train / validation / test split and selecting the model parameters, best practice is to retrain your model on the full dataset before prediction on the evaluation set.
- You can try to fill the missing
age
data withKNNImputer
. (+2 p) - You can also try to use KNN classifier. (+1 p)
Your real-world accuracy score is 56.96.