Applied Machine Learning - Week 3
Monday the 10th - Friday the 14th of May 2021
Two real world data cases:
1) Housing price analysis (regression).
Initial analysis
HousingAnalysis.ipynb and the data itself
HousingPrices.csv.
2) Surgery case identification (classification).
Initial code for reading the data
SurgeryAnalysis.ipynb.
     The data is divided into training features
X_train.csv, training labels
y_train.csv, and testing features
X_test.csv (testing labels not given!).
You should consider these datasets as "training ground", where you can apply your newly acquired skills on at will.
Monday 10th of May (afternoon):
Lectures:
Computational scaling and complexity,
k-nearest neighbour algorithm (kNN), and
k-means clustering algorithm (Brian Vinter).
     You are expected to see these three short lectures
before the lecture, which will then be an interactive discussion.
     The slides for the latter two videos can be found her:
kNN and k-Means algorithms.
Zoom:
Link to lecture (Recorded!).
     Recording in
Lecture video (303 MB) and
Lecture chat (2 kB).
Exercise: Today's exercise (also described in the videos) consists of applying the kNN and k-Means algorithms to data.
     There are two groups of examples to consider, shown below, but in general, you should by own initiative think/discuss about where this can be applied.
    
Simple small examples:
     K-Nearest-Neighbour toy example:
KNN.ipynb.
     K-means clustering toy example:
Clustering.ipynb.
     Cancer example:
Cancer.ipynb
along with
breast cancer data.
     Wine example:
Wine.ipynb,
along with
Wine data and
Wine data description.
    
Example on larger (known) data:
     Try to apply these
NeighboursAndClusters.ipynb algorithms to e.g. the Aleph b-jet data and/or
breast cancer data.
Wednesday 12th of May (morning):
Lectures:
Input feature ranking and Shapley values (TP),
ML method performance overview (TP).
     See the following note for a few (more)
Shapley Value Calculation Examples.
Zoom:
Link to lecture (Recorded!).
     Recording in
Lecture video I (289 MB),
Lecture video II (44 MB), and
Lecture chat (1 kb).
Exercise: Apply the different feature ranking methods to e.g. the Aleph b-jet data, and determine which variables are the important ones (consider first all 9 inputs and then the 6 used).
     For a discussion of feature ranking, you may also want to see this
Towards-Data-Science discussion.
     When you feel, that you understand feature ranking and SHAP values, feel free to start/work on the
Small project.
Wednesday 12th of May (afternoon):
Lectures:
Population Mixture Models (AA).
Zoom:
Link to lecture (Recorded!).
     Recording in
Lecture video I (80 MB) and
Lecture video II (33 MB).
Exercise: The exercises are contained in the presentation. For example apply the Expectation-Maximization algorithm to data of your choice.
     Also, towards the end of the session, you're welcome to work on the small project.
Example solutions from week 3:
The following are
example solutions and related code, which comes with absolutely no warrenty, that you may let yourself be inspired by:
NeighboursAndClusters_kNN_kMC.ipynb
Last updated: 15th of May 2021 by Troels Petersen.