Applied Machine Learning - Week 3
Monday the 10th - Friday the 14th of May 2022
Two real world data cases:
1) Housing price analysis (regression).
Initial analysis
HousingAnalysis.ipynb and the data itself
HousingPrices.csv.
2) Surgery case identification (classification).
Initial code for reading the data
HospitalisationAnalysis.ipynb.
     The data is divided into training features
X_train.csv, training labels
y_train.csv, testing features
X_test.csv and testing features
y_test.csv.
You should consider these datasets as "training ground", where you can apply your newly acquired skills on at will.
Monday 10th of May (afternoon):
Lectures:
Hyperparameters, Overtraining, and Early stopping (TP).
     Both slides and associated code can be found on
GitHub
Exercise: Try to optimise your algorithms with respect to the HyperParameters of your model/architecture.
     Note that for NNs it is the learning rate in particular, which is important, and that you might want a scheduler!
Wednesday 12th of May (morning):
Lectures:
Input feature ranking and Shapley values (TP) and
ML method performance overview (TP).
Exercise: Apply the different feature ranking methods to e.g. the Aleph b-jet data, and determine which variables are the important ones (consider first all 9 inputs and then the 6 used).
     For a discussion of feature ranking, you may also want to see this
Towards-Data-Science discussion.
     When you feel, that you understand feature ranking and SHAP values, feel free to start/work on the
Initial project.
Wednesday 12th of May (afternoon):
Lectures:
Final projects kickoff (TP).
Introduction to clustering and related algorithms (TP).
Exercise: You should start the exercise by ensuring that you either have a group, or find collaborators!
     The exercise is to apply
clustering algorithms to data of your choice.
     Once you feel comfortable with clustering, you're welcome to work on the initial and/or final project.
Last updated: 11th of May 2022 by Troels Petersen.