Applied Machine Learning - Week 3

Monday the 10th - Friday the 14th of May 2022

Two real world data cases:
1) Housing price analysis (regression). Initial analysis HousingAnalysis.ipynb and the data itself HousingPrices.csv.
2) Surgery case identification (classification). Initial code for reading the data HospitalisationAnalysis.ipynb.
     The data is divided into training features X_train.csv, training labels y_train.csv, testing features X_test.csv and testing features y_test.csv.

You should consider these datasets as "training ground", where you can apply your newly acquired skills on at will.

Monday 10th of May (afternoon):
Lectures: Hyperparameters, Overtraining, and Early stopping (TP).
     Both slides and associated code can be found on GitHub

Exercise: Try to optimise your algorithms with respect to the HyperParameters of your model/architecture.
     Note that for NNs it is the learning rate in particular, which is important, and that you might want a scheduler!

Wednesday 12th of May (morning):
Lectures: Input feature ranking and Shapley values (TP) and ML method performance overview (TP).

Exercise: Apply the different feature ranking methods to e.g. the Aleph b-jet data, and determine which variables are the important ones (consider first all 9 inputs and then the 6 used).
     For a discussion of feature ranking, you may also want to see this Towards-Data-Science discussion.
     When you feel, that you understand feature ranking and SHAP values, feel free to start/work on the Initial project.

Wednesday 12th of May (afternoon):
Lectures: Final projects kickoff (TP). Introduction to clustering and related algorithms (TP).

Exercise: You should start the exercise by ensuring that you either have a group, or find collaborators!
     The exercise is to apply clustering algorithms to data of your choice.
     Once you feel comfortable with clustering, you're welcome to work on the initial and/or final project.

Last updated: 11th of May 2022 by Troels Petersen.