Applied Machine Learning - Week 1
Monday the 20th - Friday the 24th of April 2026
Groups: We
highly recommend that you also work/collaborate/discuss in a group for exercises, and for the final project you should find a group (administrated by Norman).
Reference Data I (Aleph b-quark identification):
In order to learn about ML, we need to have a nice, simple, scaled, mutually exclusive sampled, nummerically sound, unflawed,
possibly large, and perfectly labelled (i.e. simulated) dataset with competitive predictions on (to compare performance) to train and test on.
It sounds like an impossibility, but I happen to have the "Aleph b-quark tagging" dataset, with a Neural Net prediction (
first paper from 1992!) in,
see bottom of page.
Monday 20th of April (afternoon):
Lectures: Intro to course, outline, groups, and discussion of data and goals (TP).
    
Introduction to AppML Course and
Introduction to Machine Learning (TP).
Exercise: Setup of infrastructure (Python, Github, etc.). Test your Python setup with
ML_MethodsDemos.ipynb.
     Getting a feel for the
Curse of Dimensionality, making life in high dimensions a lonely one!
     Inspecting data and making a "human" decision tree for classification:
Code for initial analysis:
BjetSelection_original.ipynb
(classifying with if-sentences!)
Wednesday 22nd of April (morning - starting exceptionally 8:15!):
Lectures: Intro to
Tree-based algorithms,
Stochastic Gradient Descent, and
Training/Validation (TP).
Exercise: Exercise:
Classification of b-quark jets in Aleph data with
tree based methods.
     Compare performance to your own Decision Tree and the Aleph NN.
     Additional (reference) data, on classifying stars, galaxies, and quasars:
Data_SDSS.txt (6.3 MB).
Wednesday 22nd of April (afternoon):
Lectures:
Introduction to NeuralNet-based algorithms and
Loss Functions (TP).
     Additional slides:
ML2026_AppliedML_Top10.pdf
Exercise: Exercise:
Classification of b-quark jets in Aleph data with
Neural Net based methods.
     Compare performance to your tree based method(s) and the Aleph NN.
     Challenge: Given a "large" dataset on b-jets, see how performance improves with data size.
This simulated data comes from the Aleph experiment at CERN and concerns itself with determining if jets of particles from Z boson decays are from b-quarks or not.
For more details about the data and the variables, please read the
README_AlephBtagData.txt.
Aleph Data (in CSV format):
AlephBtag_MC_train_Nev5000.csv (0.4 MB), and
AlephBtag_MC_train_Nev50000.csv (4.2 MB), and
AlephBtag_MC_train_Nev500000.csv (42 MB), and
AlephBtag_MC_train_Nev5000000.csv (401 MB), and
AlephBtag_MC_test_Nev246390.csv (20 MB).
Alpeh Data (in HDF5 format):
AlephBtag_MC_train_Nev5000.h5 (1.5 MB), and
AlephBtag_MC_train_Nev50000.h5 (5.7 MB), and
AlephBtag_MC_train_Nev500000.h5 (48 MB), and
AlephBtag_MC_train_Nev5000000.h5 (450 MB), and
AlephBtag_MC_test_Nev246390.h5 (24 MB).
Aleph Data (in PARQUET format):
AlephBtag_MC_train_Nev5000.parquet.gz (0.15 MB), and
AlephBtag_MC_train_Nev50000.parquet.gz (1.4 MB), and
AlephBtag_MC_train_Nev500000.parquet.gz (14 MB), and
AlephBtag_MC_train_Nev5000000.parquet.gz (129 MB), and
AlephBtag_MC_test_Nev246390.parquet.gz (6.4 MB).
Flawed Aleph Data (in CSV format):
AlephBtag_MC_train_Nev5000_flawed.csv (0.15 MB) and
AlephBtag_MC_train_Nev50000_flawed.csv (1.4 MB).
Alternative (medical) Data (in CSV format):
Medical_Npatients5000.csv (0.15 MB)
Medical_Npatients50000.csv (1.4 MB)
Last updated: 15th of of April 2026 by Troels Petersen.