Applied Machine Learning - Week 2

Monday the 3rd - Friday the 7th of May 2021

Groups: Here you can find suggested collaboration groups (administrated by Zoe).

Monday 3rd of May (afternoon):
Lectures: Introduction to Loss Functions and Stochastic Gradient Descent for both classification and regression (TP).
     Data collection, cleaning, preprocessing, and dimensionality reduction: Data collection and preprocessing (AA).

Zoom:Link to lecture (Recorded!).

Recording in Lecture video I (177 MB), Lecture video II (17 MB), and Lecture video III (30 MB).

Exercise: The exercise is to extract data with queries, combine and clean this data, and finally run a (k)PCA on it!
     A "non-code" notebook is provided with more details of the exercise: StarsGalaxiesAndQuasars_Intro.ipynb.
     Try to separate the classes (stars, galaxies, quasars) and see (quantify!) to what extend your (k)PCA divides the data accordingly!

Queries on astro databases: The queries can be done at this dataset: SDSS database.
     Below are three SDSS datasets (each 10000 cases and 2MB), for which you should (eventually) try unsupervised learning:
     Data_Stars.txt, Data_Galaxies.txt, and Data_Quasars.txt.
     Example queries can be found here: ExampleQueries.txt (3 kB).


Wednesday 5th of May (morning):
Lectures: Training, Validation, Test, Cross Validation, and introduction to basic machinery (AA).
     Short introduction to Principle Component Analysis (PCA).

Zoom:Link to lecture (Recorded!).
     Recording in Lecture video I (140 MB) and Lecture video II (35 MB) along with chat (3 kB).

Exercise: Try to apply cross validation in your training. As this typically applied to smaller datasets, try using 10% of the Aleph b-jet data.... and 1%!
     Also check that you can see the performance on each k-fold. This also gives you an idea of the variation/uncertainty.

Wednesday 5th of May (afternoon):
Lectures: Hyperparameters, Overtraining, and Early stopping (Christian Michelsen).
     Both slides and associated code can be found on GitHub.

Zoom:Link to lecture.
     Recording in Lecture video I (287 MB) and Lecture video II (74 MB).

Exercise: Try to optimise your algorithms with respect to the HyperParameters of your model/architecture.
     Note that for NNs it is the learning rate in particular, which is important!

Example solutions from week 2:
The following are example solutions and related code, which comes with absolutely no warrenty, that you may let yourself be inspired by:
  • BjetSelection_PCA_and_kPCA.ipynb PCA example (from Troels).
  • StarsGalaxiesAndQuasars_PCA_and_kPCA.ipynb PCA example (from Rasmus).
    Last updated: 8th of May 2021 by Troels Petersen.