Big Data Analysis - Week 2

Monday the 27th of April - Friday the 1st of May 2020

Monday 27th of April (afternoon):
Lectures: Data collection, preprocessing, and dimensionality reduction. Data collection and preprocessing (AA).

Zoom:Link to lecture (Recorded!).

Exercise: The exercise is to run a (k)PCA on (a) the b-quark data table, try to separate the jets, and/or (b) the SDSS data table, try to separate the classes (see page 28 in above slides).
     Example queries can be found here: ExampleQueries.txt (3 kB).
     Example queries that one one of the "solutions": SolutionQuery.txt (1 kB).
     Example result of a query is here: exampleQSO_22GalLat_Zoe.csv (33 MB) (QuaryBehindDataFile.txt).


Wednesday 29th of April (morning):
Lectures: Training, Validation, Test, Cross Validation, and introduction to basic machinery (AA).
     Additional slides: ML2020_LossFunctions.pdf (1.6 MB).

Zoom:Link to lecture (Recorded!).

Exercise: Try to apply cross validation in your training.


Wednesday 29th of April (afternoon):
Lectures: Hyperparameters, Overtraining, and Early stopping (Christian Michelsen).

Zoom:Link to lecture.
     Recording in Lecture video (119 MB) and Lecture audio (11 MB) along with chat (28 kB).
     Recording in Exercise video (114 MB) and Exercise audio (24 MB).

Exercise: Both slides and associated code can be found on GitHub.


Example solutions from week 2:
The following are example solutions and related code, which comes with absolutely no warrenty, that you may let yourself be inspired by:
  • ExerciseWeek2PCA_Runi.py PCA example (from Runi).
  • PCAexample_CarlJohnsen.ipynb PCA example (from Carl).
  • kPCAexample.py PCA example (from Carl).
    Last updated: 29th of April 2020 by Troels Petersen.