Applied Machine Learning - Week 3

Monday the 10th - Friday the 14th of May 2021

Two real world data cases:
1) Housing price analysis (regression). Initial analysis HousingAnalysis.ipynb and the data itself HousingPrices.csv.
2) Surgery case identification (classification). Initial code for reading the data SurgeryAnalysis.ipynb.
     The data is divided into training features X_train.csv, training labels y_train.csv, and testing features X_test.csv (testing labels not given!).
You should consider these datasets as "training ground", where you can apply your newly acquired skills on at will.

Monday 10th of May (afternoon):
Lectures: Computational scaling and complexity, k-nearest neighbour algorithm (kNN), and k-means clustering algorithm (Brian Vinter).
     You are expected to see these three short lectures before the lecture, which will then be an interactive discussion.
     The slides for the latter two videos can be found her: kNN and k-Means algorithms.

Zoom:Link to lecture (Recorded!).
     Recording in Lecture video (303 MB) and Lecture chat (2 kB).

Exercise: Today's exercise (also described in the videos) consists of applying the kNN and k-Means algorithms to data.
     There are two groups of examples to consider, shown below, but in general, you should by own initiative think/discuss about where this can be applied.

     Simple small examples:
     K-Nearest-Neighbour toy example: KNN.ipynb.
     K-means clustering toy example: Clustering.ipynb.
     Cancer example: Cancer.ipynb along with breast cancer data.
     Wine example: Wine.ipynb, along with Wine data and Wine data description.
     Example on larger (known) data:
     Try to apply these NeighboursAndClusters.ipynb algorithms to e.g. the Aleph b-jet data and/or breast cancer data.


Wednesday 12th of May (morning):
Lectures: Input feature ranking and Shapley values (TP), ML method performance overview (TP).
     See the following note for a few (more) Shapley Value Calculation Examples.

Zoom:Link to lecture (Recorded!).
     Recording in Lecture video I (289 MB), Lecture video II (44 MB), and Lecture chat (1 kb).

Exercise: Apply the different feature ranking methods to e.g. the Aleph b-jet data, and determine which variables are the important ones (consider first all 9 inputs and then the 6 used).
     For a discussion of feature ranking, you may also want to see this Towards-Data-Science discussion.
     When you feel, that you understand feature ranking and SHAP values, feel free to start/work on the Small project.

Wednesday 12th of May (afternoon):
Lectures: Population Mixture Models (AA).

Zoom:Link to lecture (Recorded!).

     Recording in Lecture video I (80 MB) and Lecture video II (33 MB).

Exercise: The exercises are contained in the presentation. For example apply the Expectation-Maximization algorithm to data of your choice.
     Also, towards the end of the session, you're welcome to work on the small project.

Example solutions from week 3:
The following are example solutions and related code, which comes with absolutely no warrenty, that you may let yourself be inspired by:
  • NeighboursAndClusters_kNN_kMC.ipynb

  • Last updated: 15th of May 2021 by Troels Petersen.