Applied Machine Learning - Week 2
Monday the 28th of April - Friday the 2nd of May 2025
Monday 28th of April (afternoon):
Lectures: We will then look into
Hyperparameters, Overtraining, and Early stopping.
Both slides and associated code making them can be found on
GitHub.
     We will also have a look at the Week1 exercise solutions,
and possibly discuss
Loss values in training and validation.
Exercises: The main exercise is to run a
Hyper Parameter (HP) optimisation on one of your algorithm trainings. Compare "Grad", "Grid", and "Random" in time and performance.
     As an additional exercise, make a
regression model of e.g. energy or |cTheta| from the other variables in the Aleph b-jet dataset.
Wednesday 30th of April (morning):
Lectures:
Input feature ranking and Shapley values and
ML method performance overview.
    
Initial project kick-off with
introduction to this "Kaggle-like" project.
Exercise: Apply the different
feature ranking methods to
e.g. the Aleph b-jet data, and determine which variables are the important ones (consider the 6 variables used).
     As a more real world data case, try the
Housing Price dataset for predicting sales prices (i.e. regression).
     Notice that the data is not as "curated", as the Aleph data, i.e. there are missing values and outliers in.
Wednesday 30th of April (afternoon):
Lectures:
Final project inspiration, teaser, and dataset discussion.
     Introduction to Unsupervised Learning:
Clustering and Nearest Neighbor algorithms.
Also, I'll briefly introduce/remind about
Principle Component Analysis
Exercise: The exercises is to apply
clustering algorithms (and potentially dimensionality reduction) to datasets of increasing size and complexity:
    
Simple small examples:
     1. K-means clustering toy example applied to simulated data:
Clustering.ipynb.
     2. Wine data example (3 classes, 14 features, 178 cases):
Wine.ipynb, along with
Wine data and
Wine data description.
    
Example on larger (known) data:
     Dataset 1: Aleph b-jets (5000+ cases with 6-9 features, which you know already).
     Dataset 2: The interesting
Cosmos2015outlier data (10000 cases with 13 features, from astro physics).
     Dataset 3: The real
Cosmos2015 data (20355 cases with 13 features, from astro physics).
Last updated: 25th of April 2025 by Troels Petersen.