Applied Machine Learning 2024

"Despite the connotations of machine learning and artificial intelligence as a mysterious and radical departure from traditional approaches, we stress that machine learning has a mathematical formulation that is closely tied to statistics, the calculus of variations, approximation theory, and optimal control theory."
[Introduction to Machine Learning, Particle Data Group (pdg.lbl.org) 2021]

Troels C. Petersen Daniel Murnane Arnau Morancho Tarda Thomas F. M. Spieksma
Lecturer - Associate Prof. Teacher - DDSA Fellow Teaching assistent - Ph.D. Teaching assistent - Ph.D.
NBI - High Energy Physics NBI - High Energy Physics NBI - High Energy Physics NBI - Gravitational Physics
26 28 37 39 93 83 89 58 +34 629 350 151 +31 610 844 213
Mac user Windows/Linux expert Windows expert Mac expert
petersennbi.dk daniel.murnanenbi.ku.dk arnau.moranchonbi.ku.dk thomas.spieksmanbi.ku.dk


What, when, where, prerequisites, books, curriculum and evaluation:
Content: Graduate course on Machine Learning and application/project in science (7.5 ECTS).
Level: Intended for students at graduate level (4th-5th year) and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python).
When: Mondays 13-14 / 14-17 and Wednesdays 9-10 / 10-12 & 13-14 / 14-17 for lectures/exercises (Week Schedule Group C).
Where (lectures): Mondays: Aud. 10 at HCO. Wednesdays: Lille UP1 at DIKU (except 6th and 8th of May, Aud. 3 at HCO),
Where (exercises): Mondays + Wednesdays: Biocenter 4-0-05 and 4-0-13, see KU Room Schedule plan.
Format: Shorter lectures followed by computer exercises and discussion with emphasis on application and projects.
Text book: References to (the excellent!) Applied Machine Learning by David Forsyth.
Suppl. literature: We (i.e. you) will make extensive use of online ML resources, collected for this course.
Programming: Primarily Python 3.12 with a few packages on top, though this is an individual choice.
Code repository: All code we provide can be found in the AppliedML2024 GitHub respository.
Communication: All announcements will be given through Absalon. To reach me, Email is preferable.
Initial Project: Initial project (a la Kaggle competition) to be submitted Monday the 20th of May.
Final Project: Final project (Exam) presentations on Wednesday the 12th (all day) and Thursday the 13th of June (morning).
Evaluation: Initial project (40%), and final project (60%), evaluated by lecturers following the Danish 7-step scale.


“I am telling you, the world’s first trillionaires are going to come from somebody who masters AI and all its derivatives,
and applies it in ways we never thought of.”
[Mark Cuban (1958), American businessman]


Before course start:
An introduction to the course can be gotten from this ML subject overview and related film introducing the course subjects (23 min, 1.48 GB).
Specific course information can be found here: ML2024_CourseInformation.pdf
To better know who you are, and optimising the course accordingly, please fill in the course questionnaire.
To test your "Python & Packages" setup, you can try to run ML_MethodsDemos.ipynb (which is also meant to whet your appetite).



Course outline:
Below is the preliminary course outline, subject to possible changes throughout the course.

Week 1 (Introduction to Machine Learning concepts methods. Tree and Neural Network learning):
Apr 22: 13:15-17:00: Intro to course, outline, groups, and discussion of data and goals. Overview of Machine Learning and techniques (TP).
     Exercise: Inspecting data and making "human" decision tree and linear (Fisher) discriminant. Also, setup of Python, Github, Slack, etc.
Apr 24: 8:15-12:00: Loss function, Training, Validation, Test, Cross Validation (TP), and Introduction to Tree-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Decision Tree based methods.
Apr 24: 13:15-17:00: Introduction to NeuralNet-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Neural Net based methods.

Week 2 (Initial project kickoff, Hyper Parameter optimisation, Feature Importance, Introduction to unsupervised learning and clustering):
Apr 29: 13:15-17:00: Initial project kickoff. Hyperparameters, Overtraining, and Early stopping (TP).
     Exercise: Hyperparameter optimisation of simple tree and NN algorithms.
May 1: 9:15-12:00: Feature Importance calculated using permutations and Shapley values (TP).
     Exercise: Determine feature ranking for reference data sets, and cross check these with actual models.
May 1: 13:15-17:00: Introduction to Unsupervised Learning: Clustering and Nearest Neighbor algorithms (TP).
     Exercise: Try to apply the k-NN (and other) algorithms to reference data sets.

Week 3 (Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), Time-Series, and Natural Language Processing (NLP)):
May 6: 13:15-17:00: Convolutional Neural Networks (CNNs) and image analysis (DM).
     Exercise: Recognize images (MNIST dataset and insoluables from Greenland ice cores) with a CNN.
May 8: 9:15-12:00: Graph Neural Networks (GNNs) and geometric learning (DM).
     Exercise: Work on the classic GNN example data (TBD).
May 8: 13:15-17:00: Time-series, Transformers, and Natural Language Processing (NLP) (DM).
     Exercise: Predict future flight traffic and do NLP on IMDB movie reviews.

Week 4 (Final Project kickoff, Preprocessing, Anomaly detection, Dimensionality reduction, and Summary):
May 13: 13:15-17:00: Final projects kickoff. Discussion of how to work on projects.
     Exercise: Getting, plotting and planning on final project data and discussion of project goals.
May 15: 9:15-12:00: Preprocessing, anomaly detection, and dimensionality reduction (TP).
     Exercise: Clean data and run a (k)PCA, k-Means, and possibly tSNE/UMAP algorithm on reference data sets.
May 15: 13:15-17:00: Summary of curriculum so far (TP).
     Exercise: Work on initial project.

Week 5 (Computing and scaling/speed (videos), Generative Adversarial Networks (GANs), Reinforcement Learning, and GPU data analysis):
May 20: 13:15-17:00: No teaching (Whit Monday). Potentially, work on initial project.
     Bonus (self) study (by video): Infrastructure, Networks, Scaling, and Speed (Brian Vinter). Initial project should be submitted by 22:00!.
May 22: 9:15-12:00: Generative Adversarial Networks, Diffusion, and Reinforcement Learning (TP).
     Exercise: Work on final project.
May 22: 13:15-17:00: GPU accelerated data analysis - Rapids (Mads Ruben Kristensen, Nvidia - formerly NBI).
     Exercise: Work on final project.

Week 6 (Auto-Encoders (AE), Ethics in ML, and Echo State Networks):
May 27: 13:15-17:00: (Variational) AutoEncoders and (more) anomaly detection (TP).
     Exercise: Work on final project.
May 29: 9:15-12:00: Ethics in the usage of Machine Learning (TP).
     Exercise: Work on final project.
May 29: 13:15-17:00: Echo State Networks (TBD), and example exam presentation.
     Exercise: Work on final project.

Week 7 (Example of CNNs and VAEs at work (on beer and food), Training neural networks (videos)):
Jun 3: 13:15-17:00: Example of CNN at work on beer, and AutoEncoders at work on food (Carl Johnsen).
     Exercise: Work on final project.
Jun 5: 9:15-12:00: Constitution Day (no teaching).
     Exercise: Work on final project.
Jun 5: 13:15-17:00: Constitution Day (no teaching).
     Bonus (self) study: Recipe for training a neural network and associated video on the same subject.

Week 8 (Results on Initial Project, discussion of final project presentations, and Exam):
Jun 10: 13:15-17:00: Results and Feedback on initial project.
     Exercise: Work on final project.
Jun 12: 8:45-12:00: Presentations of final projects (TP, CS, AD, AM, TS, and potentially others!).
Jun 12: 13:15-17:00: Presentations of final projects (continued).
Jun 13: 8:45-12:00: Presentations of final projects (if needed!) (TP, CS, AD, AM, TS, and potentially others!).
Jun 13: 13:15-17:00: Presentations of final projects (if needed!) (continued).




Presentations from previous years


"Big Data is like teenage sex... everyone talks about it, nobody really knows how to do it, everyone else is doing it, so everyone claims they are doing it!"
[Dan Ariely, Professor at Duke University]


Course comments/praise (very biased selection!):
"Best day of my life!" (Pressumably at the University, red.)
[Christian M. Clausen, on the day of final project presentations, 2019]

"Student 1: Damn..."
"Student 2: I was just thinking what a shame you didn't get to see a whole classroom worth of 'damn' faces! But the feeling is there."

[Reaction in Zoom chat, after having explained the capabilities of Reinforcement Learning examplified by AlphaZero, 2020]
[Fortunately, I got to see the reaction the year before!]

"Troels is the perfect shepherd guiding relatively inexperienced statisticians to machine learning in an approachable and fun way."
[Anon, course evaluation, 2021]

"This course (and Applied Statistics) were among the most useful and insightful courses I have taken in my academic life."
[Petroula Karakosta, 2022]

"I applaud the delivery with hands-on tutorial sessions, supported by overview lectures. The assessments excellently supported the learning with the initial project helping us get over the initial bump, and the group project showing us how to apply ML to our own interests. 5/5 stars!"
[Alice Patig, Ph.D. student at DTU, 2023]


Last updated 13th of March 2024 by Troels Petersen.