Applied Machine Learning 2024

"Despite the connotations of machine learning and artificial intelligence as a mysterious and radical departure from traditional approaches, we stress that machine learning has a mathematical formulation that is closely tied to statistics, the calculus of variations, approximation theory, and optimal control theory."
[Introduction to Machine Learning, Particle Data Group (pdg.lbl.org) 2021]


Troels C. Petersen	Daniel Murnane	Arnau Morancho Tarda	Thomas F. M. Spieksma
Lecturer - Associate Prof.	Teacher - DDSA Fellow	Teaching assistent - Ph.D.	Teaching assistent - Ph.D.
NBI - High Energy Physics	NBI - High Energy Physics	NBI - High Energy Physics	NBI - Gravitational Physics
26 28 37 39	93 83 89 58	+34 629 350 151	+31 610 844 213
Mac user	Windows/Linux expert	Windows expert	Mac expert
petersennbi.dk	daniel.murnanenbi.ku.dk	arnau.moranchonbi.ku.dk	thomas.spieksmanbi.ku.dk

What, when, where, prerequisites, books, curriculum and evaluation:

Content:	Graduate course on Machine Learning and application/project in science (7.5 ECTS).
Level:	Intended for students at graduate level (4th-5th year) and new Ph.D. students.
Prerequisites:	Math (calculus and linear algebra) and programming experience (preferably Python).
When:	Mondays 13-14 / 14-17 and Wednesdays 9-10 / 10-12 & 13-14 / 14-17 for lectures/exercises (Week Schedule Group C).
Where (lectures):	Mondays: Aud. 10 at HCO. Wednesdays: Lille UP1 at DIKU (except 6th and 8th of May, Aud. 3 at HCO),
Where (exercises):	Mondays + Wednesdays: Biocenter 4-0-05 and 4-0-13, see KU Room Schedule plan.
Format:	Shorter lectures followed by computer exercises and discussion with emphasis on application and projects.
Text book:	References to (the excellent!) Applied Machine Learning by David Forsyth.
Suppl. literature:	We (i.e. you) will make extensive use of online ML resources, collected for this course.
Programming:	Primarily Python 3.12 with a few packages on top, though this is an individual choice.
Code repository:	All code we provide can be found in the AppliedML2024 GitHub respository.
Communication:	All announcements will be given through Absalon. To reach me, Email is preferable.
Initial Project:	Initial project (a la Kaggle competition) to be submitted Monday the 20th of May.
Final Project:	Final project (Exam) presentations on Wednesday the 12th (all day) and Thursday the 13th of June (morning).
Evaluation:	Initial project (40%), and final project (60%), evaluated by lecturers following the Danish 7-step scale.

“I am telling you, the world’s first trillionaires are going to come from somebody who masters AI and all its derivatives,
and applies it in ways we never thought of.” [Mark Cuban (1958), American businessman]

Final Projects/Exam:
An overview of final project groups, subjects, suggestions, and exam schedule can be found here: Final Project Presentation Schedule (with comments).
You can find the Final Project Evaluation forms (for you to grade the projects) here: Wednesday the 12th of June and Thursday the 13th of June.
The analysis of the results leading to the grading is explained here: ML2024_FinalProject_CommentsOnGrading.pdf.

Below you can find the presentations of the 2024 final projects:

Final Project 01 on FoCalH calorimeter by Julie, Bjatur, Jens, and Peter.

Final Project 02 on MTG predictions by August, Jacob, and Laust.

Final Project 03 on Qbit Classification by Anna, Emil, and Peter.

Final Project 04 on Ward Hospital Project by Agnete, Nete, and Emilie.

Final Project 05 on Emotion Detection by Eric, Andreas, Simon, Emilie, and Jonathan.

Final Project 06 on Ward Hospital Project by Carl and Andreas.

Final Project 07 on Musical Instrument Separation by Ian, Jakob, Luc, Sascha, and Simon.

Final Project 08 on IceCube Reconstruction by Ana Iulia and Thor.

Final Project 09 on Prosumers by Theo, Xaver, Inigo, and Alicja.

Final Project 10 on Adversarial Training by Marcus, Gor, Frederik, and Johan.

Final Project 11 on Particles On CCD by Sejr, Daniel, Asbjorn, and Jens.

Final Project 12 on Geomagnetic Storms by Ali and Florent.

Final Project 13 on Glacier Volumes by Jonas, Josephine, Luisa, and Simon.

Final Project 14 on B-jet Regression by Zhongqi, Hengdong, and Zhenzhong.

Final Project 15 on Glacier Volumes by Cerina, Marcus, and Emma.

Final Project 16 on Damaged Cell Nuclei Identification by Antonio, Dimitrios, Georgios, and Liam.

Final Project 17 on Bird Image Classification by Emma, Helen, Jack, Kevin, and Love.

Final Project 18 on Wildfire Prediction by Mikkel, Svenja, and Andrea.

Final Project 19 on NFL simulator by Jaume, Cody, Tom, and Cyan.

Final Project 20 on JWST MIRI Noise Patterns by Prune.

Course outline:
Below is the preliminary course outline, subject to possible changes throughout the course.

Week 1 (Introduction to Machine Learning concepts methods. Tree and Neural Network learning):
Apr 22: 13:15-17:00: Intro to course, outline, groups, and discussion of data and goals. Overview of Machine Learning and techniques (TP).
     Exercise: Inspecting data and making "human" decision tree and linear (Fisher) discriminant. Also, setup of Python, Github, Slack, etc.
Apr 24: 8:15-12:00: Loss function, Training, Validation, Test, Cross Validation (TP), and Introduction to Tree-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Decision Tree based methods.
Apr 24: 13:15-17:00: Introduction to NeuralNet-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Neural Net based methods.

Week 2 (Initial project kickoff, Hyper Parameter optimisation, Feature Importance, Introduction to unsupervised learning and clustering):
Apr 29: 13:15-17:00: Initial project kickoff. Hyperparameters, Overtraining, and Early stopping (TP).
     Exercise: Hyperparameter optimisation of simple tree and NN algorithms.
May 1: 9:15-12:00: Feature Importance calculated using permutations and Shapley values (TP).
     Exercise: Determine feature ranking for reference data sets, and cross check these with actual models.
May 1: 13:15-17:00: Introduction to Unsupervised Learning: Clustering and Nearest Neighbor algorithms (TP).
     Exercise: Try to apply the k-NN (and other) algorithms to reference data sets.

Week 3 (Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), Time-Series, and Natural Language Processing (NLP)):
May 6: 13:15-17:00: Convolutional Neural Networks (CNNs) and image analysis (DM).
     Exercise: Recognize images (MNIST dataset and insoluables from Greenland ice cores) with a CNN.
May 8: 9:15-12:00: Graph Neural Networks (GNNs) and geometric learning (DM).
     Exercise: Work on the classic GNN example data (TBD).
May 8: 13:15-17:00: Time-series, Transformers, and Natural Language Processing (NLP) (DM).
     Exercise: Predict future flight traffic and do NLP on IMDB movie reviews.

Week 4 (AutoEncoders and anamaly detection, Final Project kickoff):
May 13: 13:15-17:00: (Variational) AutoEncoders and anomaly detection (Inar Timiryasov).
     Exercise: AutoEncoding the MNIST dataset and possibly detecting anomalies in the data sample.
May 15: 9:15-12:00: Final projects kickoff. Discussion of projects and how to work on them (TP).
     Exercise: Getting, plotting and planning on final project data and discussion of project goals.
May 15: 13:15-17:00: Dimensionality reduction with introduction to t-SNE and UMAP algorithms.
     Exercise: Work on initial project and/or final project.

Week 5 (Computing and scaling/speed (videos), Data preprocessing, Summary of Curriculum, and GPU accelerated data analysis):
May 20: 13:15-17:00: No teaching (Whit Monday). Potentially, work on initial project.
     Bonus (self) study (by video): Infrastructure, Networks, Scaling, and Speed (Brian Vinter). Initial project should be submitted by 22:00 on Absalon!.
May 22: 9:15-12:00: Preprocessing and summary of curriculum so far (TP).
     Exercise: Clean data and run algorithms on reference flawed data sets. Work on final project.
May 22: 13:15-17:00: GPU accelerated data analysis - Rapids (Mads Ruben Kristensen, Nvidia - formerly NBI).
     Exercise: Work on final project.

Week 6 (Generative Adversarial Networks, Reinforcement Learning, Ethics in ML, CNNs on Beer, and Exam Example presentations):
May 27: 13:15-17:00: Generative Adversarial Networks, Diffusion Models (TBC), and Reinforcement Learning (TP).
     Exercise: Work on final project.
May 29: 9:15-12:00: Ethics in the usage of Machine Learning (TP+DM).
     Exercise: Work on final project.
May 29: 13:15-17:00: Example of CNN at work on beer, and AutoEncoders at work on food (Carl Johnsen). Example exam presentation (Mikkel Moeller).
     Exercise: Work on final project.

Week 7 (Results and Feedback on initial project, Discussion and schedule of exam, Training neural networks (videos)):
Jun 3: 13:15-17:00: Results and Feedback on initial project. Discussion of industry cases.
     Exercise: Work on final project.
Jun 5: 9:15-12:00: Constitution Day (no teaching).
     Bonus (self) study: Recipe for training a neural network and associated video on the same subject.
Jun 5: 13:15-17:00: Constitution Day (no teaching).
     Exercise: Work on final project.

Week 8 (Energy regression using CNNs, Discussion of industry cases, and... Exam):
Jun 10: 13:15-17:00: Discussion of Industry cases, and on the differences between real and simulated data.
     Exercise: Work on final project.
Jun 12: 8:45-12:00: Presentations of final projects (TP, CS, AD, AM, TS, and potentially others!).
Jun 12: 13:15-17:00: Presentations of final projects (continued).
Jun 13: 8:45-12:00: Presentations of final projects (if needed!) (TP, CS, AD, AM, TS, and potentially others!).

Presentations from previous years

"Big Data is like teenage sex... everyone talks about it, nobody really knows how to do it, everyone else is doing it, so everyone claims they are doing it!"
[Dan Ariely, Professor at Duke University]

Course comments/praise (very biased selection!):

"Best day of my life!" (Pressumably at the University, red.)
[Christian M. Clausen, on the day of final project presentations, 2019]

"Student 1: Damn..."
"Student 2: I was just thinking what a shame you didn't get to see a whole classroom worth of 'damn' faces! But the feeling is there."
[Reaction in Zoom chat, after having explained the capabilities of Reinforcement Learning examplified by AlphaZero, 2020]
[Fortunately, I got to see the reaction the year before!]

"Troels is the perfect shepherd guiding relatively inexperienced statisticians to machine learning in an approachable and fun way."
[Anon, course evaluation, 2021]

"This course (and Applied Statistics) were among the most useful and insightful courses I have taken in my academic life."
[Petroula Karakosta, 2022]

"I applaud the delivery with hands-on tutorial sessions, supported by overview lectures. The assessments excellently supported the learning with the initial project helping us get over the initial bump, and the group project showing us how to apply ML to our own interests. 5/5 stars!"
[Alice Patig, Ph.D. student at DTU, 2023]

Last updated 19th of May 2024 by Troels Petersen.