Applied Machine Learning - Week 2

Monday the 1st - Friday the 5th of May 2023

Monday 1st of May (afternoon):
Lectures: We will start by looking at solutions to the Week1 exercises, and then apply these to other data, this time focusing on regression.
     Initial project start, and discussion of Loss values in training and validation.
     If time allows, I'll briefly introduce/remind about Principle Component Analysis (For reference: Recording of the 2021 lecture (17 MB)).

Exercise: Try to use Tree- and NN-based learning algorithms to do regression on the following two datasets:
     1. Predict jet energy ("energy") and/or jet angle ("cTheta") from the other variables in the Aleph b-jet dataset.
     2. Predict housing prices using the HousingPrices.csv data (49290 entries with 90 features, missing values).
     The exercises can also be seen written up here, with a few more details.
     Additional slides: ML2023_Example_HousingPrices.pdf


Wednesday 5th of May (morning):
Lectures: Introduction to Unsupervised Learning: Clustering and Nearest Neighbor algorithms.

Exercise: The exercise consists of applying dimensionality reduction (v1) (and possibly pre-processing before that) to datasets of increasing complexity:
     Dataset 1: Fisher's famous irises (150 cases with 4 features. Can be obtained through SKlearn's toy datasets.
     Dataset 2: Aleph b-jets (5000+ cases with 6-9 features, which you know already).
     Dataset 3: The interesting Cosmos2015outlier data (10000 cases with 13 features, from astro physics).
     Dataset 4: The real Cosmos2015 data (20355 cases with 13 features, from astro physics).


Wednesday 5th of May (afternoon):
Lectures: Further work on Unsupervised Learning: Clustering and Nearest Neighbor algorithms on astro data.

Exercise: The exercise consists of preprocessing and applying dimensionality reduction (v2) to the Cosmos2015 data.
     The analysis can also include the Swift Properties and Swift Gamma Ray Bursts datasets.


Last updated: 30th of April 2023 by Troels Petersen.