Applied Machine Learning - Week 2
Monday the 3rd - Friday the 7th of May 2021
Groups: Here you can find
suggested collaboration groups (administrated by Zoe).
Monday 3rd of May (afternoon):
Lectures: Introduction to
Loss Functions and
Stochastic Gradient Descent for both classification and regression (TP).
     Data collection, cleaning, preprocessing, and dimensionality reduction:
Data collection and preprocessing (AA).
Zoom:
Link to lecture (Recorded!).
Recording in
Lecture video I (177 MB),
Lecture video II (17 MB), and
Lecture video III (30 MB).
Exercise: The exercise is to extract data with queries, combine and clean this data, and finally run a (k)PCA on it!
     A "non-code" notebook is provided with more details of the exercise:
StarsGalaxiesAndQuasars_Intro.ipynb.
     Try to separate the classes (stars, galaxies, quasars) and see (quantify!) to what extend your (k)PCA divides the data accordingly!
Queries on astro databases: The queries can be done at this dataset:
SDSS database.
     Below are three SDSS datasets (each 10000 cases and 2MB), for which you should (eventually) try unsupervised learning:
    
Data_Stars.txt,
Data_Galaxies.txt, and
Data_Quasars.txt.
     Example queries can be found here:
ExampleQueries.txt (3 kB).
Wednesday 5th of May (morning):
Lectures:
Training, Validation, Test, Cross Validation, and introduction to basic machinery (AA).
     Short introduction to
Principle Component Analysis (PCA).
Zoom:
Link to lecture (Recorded!).
     Recording in
Lecture video I (140 MB) and
Lecture video II (35 MB) along with
chat (3 kB).
Exercise: Try to apply cross validation in your training. As this typically applied to smaller datasets, try using 10% of the Aleph b-jet data.... and 1%!
     Also check that you can see the performance on each k-fold. This also gives you an idea of the variation/uncertainty.
Wednesday 5th of May (afternoon):
Lectures:
Hyperparameters, Overtraining, and Early stopping (Christian Michelsen).
     Both slides and associated code can be found on
GitHub.
Zoom:
Link to lecture.
     Recording in
Lecture video I (287 MB) and
Lecture video II (74 MB).
Exercise: Try to optimise your algorithms with respect to the HyperParameters of your model/architecture.
     Note that for NNs it is the learning rate in particular, which is important!
Example solutions from week 2:
The following are
example solutions and related code, which comes with absolutely no warrenty, that you may let yourself be inspired by:
BjetSelection_PCA_and_kPCA.ipynb PCA example (from Troels).
StarsGalaxiesAndQuasars_PCA_and_kPCA.ipynb PCA example (from Rasmus).
Last updated: 8th of May 2021 by Troels Petersen.