Applied Machine Learning 2023

"As one Google Translate engineer put it: 'when you go from 10,000 training examples to 10 billion training examples, it all starts to work.
Data trumps everything.'"

[Garry Kasparov, Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins]

Troels C. Petersen Charles L. Steinhardt Azzurra d'Alessandro Arnau Morancho Tarda Thomas F. M. Spieksma
Lecturer - Associate Prof. Teacher - Associate Prof. Teaching assistent - Ph.D. Teaching assistent - Ph.D. Teaching assistent - Ph.D.
NBI - High Energy Physics NBI - Cosmology NBI - Astrophysics NBI - High Energy Physics NBI - Gravitational Physics
35 52 54 42 / 26 28 37 39 27 51 44 47 42 45 39 54 +34 629 350 151 +31 610 844 213
Mac user Mac user Linux expert Windows expert Mac expert
petersennbi.dk steinhardtnbi.ku.dk azzurra.dalessandronbi.ku.dk arnau.moranchonbi.ku.dk thomas.spieksmanbi.ku.dk


What, when, where, prerequisites, books, curriculum and evaluation:
Content: Graduate course on Machine Learning and application/project in science (7.5 ECTS).
Level: Intended for students at graduate level (4th-5th year) and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python).
When: Mondays 13-17 and Wednesdays 9-17 (Week Schedule Group C) in Block 4 (24/04-16/06 2023).
Where (lectures): Mondays: Store UP1 at DIKU (except week 2+3, Aud. 1 at August Krogh), Wednesdays: Lille UP1 at DIKU.
Where (exercises): Mondays + Wednesdays: DIKU rooms Bib 4-0-17 and 0v - 3-0-25, see KU Room Schedule plan.
Format: Shorter lectures followed by computer exercises and discussion with emphasis on experience and projects.
Text book: References to Elements of Statistical Learning II.
Suppl. literature: We (i.e. you) will make extensive use of online ML resources, collected for this course.
Programming: Primarily Python 3.8+ with a few packages on top, though this is an individual choice.
Code repository: All code we provide can be found in the AppliedML2023 GitHub respository.
Communication: All announcements will be given through Absalon. To reach me, Email is preferable.
Collaborative tools: For "short coding communication" we have made a course Slack channel (click to join): NbiAppliedML2023.slack.com.
Initial Project: Initial project (a la Kaggle competition) to be submitted Monday the 22nd of May.
Final Project: Final project (Exam) presentations on Wednesday the 14th and Thursday the 15th of June all day (9:00-17:00+).
Evaluation: Initial project (40%), and final project (60%), evaluated by lecturers following the Danish 7-step scale.


Before course start:
An introduction to the course can be gotten from this ML subject overview and related film introducing the course subjects (23 min, 1.48 GB).
Specific course information can be found here: ML2023_CourseInformation.pdf
To better know who you are, and optimising the course accordingly, please fill in the course questionnaire.
To test your "Python & Packages" setup, you can try to run ML_MethodsDemos.ipynb (which is also meant to whet your appetite).

Course exam:
The following is the final exam presentation schedule along with locations, considerations, and some suggestions.
Here you find evaluation forms (1-10 scale) for Wednesday presentations and Thursday presentations, respectively.




Course outline:
Below is the preliminary course outline, subject to possible changes throughout the course.

Week 1 (Introduction to Machine Learning concepts and basic methods):
Apr 24: 13:15-17:00: Intro to course, outline, groups, and discussion of data and goals. Overview of Machine Learning techniques (TP).
     Exercise: Inspecting data and making "human" decision tree and linear (Fisher) discriminant. Also, (backup) setup of Python, Github, Slack, etc.
Apr 26: 8:15-12:00: Loss function, Training, Validation, Test, Cross Validation (TP), and Introduction to Tree-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Decision Tree based methods.
Apr 26: 13:15-17:00: Introduction to NeuralNet-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Neural Net based methods.

Week 2 (Review, Regression, Preprocessing, Unsupervised learning, and Clustering):
May 1: 13:15-17:00: Discussion of Tree- and NN-based solutions and PCA (TP). Initial project start (with introductions).
     Exercise: Regression (instead of classification), loss function variation, and cross validation, possibly with "harder" dataset.
May 3: 9:15-12:00: Introduction to Unsupervised Learning: Clustering and Nearest Neighbor algorithms (CS).
     Exercise: Try to apply the k-NN (and other) algorithms to reference data sets.
May 3: 13:15-17:00: Preprocessing and dimensionality reduction (CS).
     Exercise: Run a (k)PCA, k-Means, and possibly tSNE/UMAP algorithm on reference data sets.

Week 3 (Hyper Parameter optimisation, Feature Importance, Clustering, and Final Project):
May 8: 13:15-17:00: Hyperparameters, Overtraining, and Early stopping (TP).
     Exercise: Hyperparameter optimisation of simple tree and NN algorithms.
May 10: 9:15-12:00: Feature Importance calculated using permutations and Shapley values (TP).
     Exercise: Determine feature ranking for reference data sets, and cross check these with actual models.
May 10: 13:15-17:00: Final projects kickoff.
     Exercise: Clustering along with inspection of final project data and discussion of project goals.

Week 4 (Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Auto-Encoders (AE)):
May 15: 13:15-17:00: Convolutional Neural Networks (CNNs) and image analysis (Daniel Murnane).
     Exercise: Recognize images (MNIST dataset and insoluables from Greenland ice cores) with a CNN.
May 17: 9:15-12:00: Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM) and Natural Language Processing (NLP) (Inar Timiryasov).
     Exercise: Use an LSTM to predict flight traffic and do Natural Language Processing on IMDB movie reviews.
May 17: 13:15-17:00: (Variational) Auto-Encoder and anomaly detection (TP).
     Exercise: Compress images using Auto-Encoder, and cluster latent space with UMAP.

Week 5 (Graph Neural Networks (GNNs), Generative Adversarial Networks (GANs), and GPU data analysis):
May 22: 13:15-17:00: Graph Neural Networks (GNNs) - analysing geometric data (TP).
     Exercise: Work on Initial project. Initial project should be submitted by 22:00!.
May 24: 9:15-12:00: Variational AutoEncoders, Generative Adversarial Networks, and Reinforcement Learning (TP).
     Exercise: Work on final project.
May 24: 13:15-17:00: GPU accelerated data analysis - Rapids (Mads Ruben Kristensen, Nvidia - formerly NBI).
     Exercise: Work on final project.

Week 6 (Computing and scaling/speed (videos), Ethics in ML, and presentation techniques):
May 29: 13:15-17:00: No teaching (Whit Monday). Potentially, work on project.
     Bonus (self) study (by video): Computer infrastructure, Networks, Scaling, and Speed (Brian Vinter).
May 31: 9:15-12:00: Ethics in the usage of Machine Learning (TP).
     Exercise: Work on final project.
May 31: 13:15-17:00: How (not) to give presentations with (goo) example from last year's exam.
     Exercise: Work on final project.

Week 7 (Training neural networks (videos), Results on Initial Project, Echo State Networks, and Exam example):
Jun 5: 13:15-17:00: No teaching (Constitution Day). Potentially, work on project.
     Bonus (self) study: Recipe for training a neural network and associated video on the same subject.
Jun 7: 9:15-12:00: Results and Feedback on initial project.
     Exercise: Work on final project.
Jun 7: 13:15-17:00: Echo State Networks (James Avery), and example exam presentation.
     Exercise: Work on final project.

Week 8 (CNNs and AutoEncoders at work, and discussion of final project presentations):
Jun 12: 13:15-17:00: Example of CNN at work on beer, and AutoEncoders at work on food (Carl Johnsen).
     Exercise: Work on final project.
Jun 14: 8:45-12:00: Presentations of final projects (TP, CS, AD, AM, TS, and potentially others!).
Jun 14: 13:15-17:00: Presentations of final projects (continued).
Jun 15: 8:45-12:00: Presentations of final projects (TP, CS, AD, AM, TS, and potentially others!).
Jun 15: 13:15-17:00: Presentations of final projects (for as long as needed!) (continued).




Presentations from previous years:
Below you can find the presentations of the final projects given on the 12th of June 2019:
  • Project1_BomberMan.pdf
  • Project2_BoneAge.pdf
  • Project3_SpectralAnalysis.pdf
  • Project4_StockMarketAnalysis.pdf
  • Project5_FindingWallyIn2DImages.pdf
  • Project6_StellarClassificationCNN.pdf
  • Project7_PredictingAgeGenderEthnicity.pdf
  • Project8_UFOSightingDataMining.pdf
  • Project9_ClassificationOfCatsVsDogs.pdf
  • Project10_MulticlassClassificationOfHearBeats.pdf
  • Project11_PredictingAbsorptionEnergies.pdf
  • Project12_PredictingSolarBatteryProperties.pdf
  • Project13_SkinLesionClassification.pdf

    Below you can find the presentations of the final projects given on 10th of June 2020:
  • FinalProject1_RasmusPeter.pdf
  • FinalProject2_MariaAndyEmilMads_WalmartKaggle.pdf
  • FinalProject3_HelenaKatjaSimonViktoria.pdf
  • FinalProject4_AnnSofieEmyMartaYanet_RetrievalOfSeaSurfaceTemperatures.pdf
  • FinalProject5_ChristopherJoakimNikolaj_PredictingTheCriticalTempOfSuperconductors.pdf
  • FinalProject6_MikkelMikkelAskeAnnaMoust_GNNonIceCubeData.pdf
  • FinalProject7_HaiderRasmusMS_PredictingMusicPublicationYear.pdf
  • FinalProject8_AlbaMirenEdwinFynn_PredictingBloodCellType.pdf
  • FinalProject9_RuniSimoneMarcusJonathan_CalibrationForNewAstroDataForExoPlanetResearch.pdf
  • FinalProject10_SofusKristofferDavidElias_TrickingFaceTracking.pdf
  • FinalProject11_DinaAlineAlbertMichael_WheatDetection.pdf
  • FinalProject12_LaurentOrestisGiorgosCarlos_TweetSentimentExtraction.pdf
  • FinalProject13_SvendJulius_PredictingMusicGenre.pdf
  • FinalProject14_EmilMartiny_NoisyDataOnCells.pdf
  • FinalProject15_NicolasPedersen_IdentificationOfObjectsIn2DImages.pdf

    Below you can find the presentations of the final projects given on the 16th+17th of June 2021:
  • FinalProject1_UlrikSoerenMichalaMarcusAmalie_IceCoreInsoluablesClassification.pdf
  • FinalProject2_NielsBjarne_CreditCardFraudDetection.pdf
  • FinalProject3_AnnaArnauChristopherNeusChrysoula_PowerConsumptionPredictions.pdf
  • FinalProject4_AlexAsgerDanielJohan_BrainTumorMRI.pdf
  • FinalProject5_MortenFrederikNiallLeonJonathanKristian_IdentifyingChemicalsOnMarsWithChemCam.pdf
  • FinalProject6_GeorgiaJunValeriyRebecca_PredictingSongPopularityOnSpotify.pdf
  • FinalProject7_Jonathan_IdentifyingRoadSigns.pdf
  • FinalProject8_DanielEmilKevinGustav_StockMarketAnalysis.pdf
  • FinalProject9_AnnaDanaElloiseHelene_IcebergClassification.pdf
  • FinalProject10_NickTroelsJakobEmil_InsuranceClaimClassification.pdf
  • FinalProject11_LarsLiamMartin_IceCoreInsoluablesClassification.pdf
  • FinalProject12_MarcMathiasRasmusSoeren_CoastalMappingGreenland.pdf
  • FinalProject13_BenatJacobJonasPedro_DeepLearningPhotonics.pdf
  • FinalProject14_AndreaJulian_DetectingCovid19FromChestRadiographs.pdf
  • FinalProject15_EliotSofusMads_ClassifyingMusicalGenresFromAudioSnippets.pdf
  • FinalProject16_MartinSimonJeppeKristineEmmaCamilla_IceCoreInsoluablesClassification.pdf
  • FinalProject17_TobiasKaare_BirdCallClassification.pdf
  • FinalProject18_AliciaAndreasTommaso_PredictingAirplaneWeightAndBalance.pdf
  • FinalProject19_EmilyKatharinaNgaYing_ComaClusterClassification.pdf
  • FinalProject20_BeatrizMarcoVittorioMoritzCarl_IdentifyingKeplerObjects.pdf
  • FinalProject21_MarieMartinKasper_QuickDrawWithGANs.pdf
  • FinalProject22_IoannisLineaKimiSamyMaja_ClassifyingDogBreeds.pdf
  • FinalProject23_KianKianTobiasMia-Louise_IceCoreInsoluablesClassification.pdf
  • FinalProject24_PatrickMartinRuben_WinningFinalPremierLeagueWithML.pdf

    Below you can find the presentations of the final projects given on the 15th+16th of June 2022:
  • Market Making
  • Predicting weather from weather station data
  • Characterising wetland ecosystems from satellite data
  • Zebra Call Type Classification And Clustering
  • Medical Data Analysis
  • Analysis of insoluble images from Peruvian Icecore I
  • Analysis of insoluble images from Peruvian Icecore II
  • IceCore MeltLayer Predictions using a CNN
  • Machine Learning on the OMXC25 Stocks
  • Predicting financial concerns
  • Wine and label project
  • Predicting Pokemons
  • Clustering Tweets
  • Analysis of insoluble images from Peruvian Icecore III
  • Arctic Sea Ice
  • Chess AI
  • NLP for sentence classification
  • NLP on genetics
  • Analysis of insoluble images from Peruvian Icecore IV
  • Classifying and clustering ovarian cancer from DNA sequences
  • CNN in playing game
  • Classification of Heavy Neutral Leptons

    Below you can find the presentations of the final projects given on the 14th+15th of June 2023:
  • FinalProject01 by JieMadsDavidPanagiotis: MetagenomicBinning.
  • FinalProject02 by SophiaKathrineFrederikkeClotilde: ApplyingMLonSimulationsOfMilkyWay.
  • FinalProject03 by CasperMaltePhilipSebastian: BeatTheBookmakers.
  • FinalProject04 by ChristianJakobJonathanRasmusSune: HyperHyperparameterOptimizationOptimization.
  • FinalProject05 by FabrizioOmarRizThomas: PokemonPixelArt.
  • FinalProject06 by JakobBirkMalthe: IdentifyingBirdCalls.
  • FinalProject07 by PukAndreAsgerChristian: SatelliteObservationsMissingDataInterpolation.
  • FinalProject08 by MortenLudvigMichelleEmilie: UsingMLtoReadTheASLAlphabet.
  • FinalProject09 by ArnulfAtheneOliverRune: DisentaglingSpinQubitMeasurementData.
  • FinalProject10 by MarcusMagnusBrageRune: FakeNewsDetection.
  • FinalProject11 by LongMalouSinaWeiyuan: KaggleFaceDetection.
  • FinalProject12 by AliceErlendJonasTonje: MultivariateTimeseriesForecastingOfDanishWeather.
  • FinalProject13 by AntonSimonAdrianMichaelChris: NetworkBasedInertialNavigation.
  • FinalProject14 by ChristineMarieVilma: WARDproject.
  • FinalProject15 by ChenliangGuozhenMiaoSachin: QuasarSpectraAnalysis.
  • FinalProject16 by AntonGustavThomas: ClassificationOfAndRegressionOnAstronomicalObjects.
  • FinalProject17 by FrodeTeisRia: InvestmentStockAnalysisRNN.
  • FinalProject18 by AndreasRasmusShaowenYasaswyYi: RediscoveringHerculaneum.
  • FinalProject19 by ChaytonFrederikJuliusNoahSimon: PredictingPitchesInBaseball.
  • FinalProject20 by JonasMikkelOdysseas: IsolatingAndIdentifyingMusicalInstruments.
  • FinalProject21 by FrederikLukas: UnderstandingOffensiveFootball.
  • FinalProject22 by LeonieJakobJulia: IberianWildfirePrediction.
  • FinalProject23 by ImkeMajbritt: IronmanPredictions.
  • FinalProject24 by IanChristian: FocalGroup.


    "Big Data is like teenage sex... everyone talks about it, nobody really knows how to do it, everyone else is doing it, so everyone claims they are doing it!"
    [Dan Ariely, Professor at Duke University]


    Course comments/praise (very biased selection!):
    "Best day of my life!" (Pressumably at the University, red.)
    [Christian M. Clausen, on the day of final project presentations, 2019]

    "Student 1: Damn..."
    "Student 2: I was just thinking what a shame you didn't get to see a whole classroom worth of 'damn' faces! But the feeling is there."

    [Reaction in Zoom chat, after having explained the capabilities of Reinforcement Learning examplified by AlphaZero, 2020]
    [Fortunately, I got to see the reaction the year before!]

    "Troels is the perfect shepherd guiding relatively inexperienced statisticians to machine learning in an approachable and fun way."
    [Anon, course evaluation, 2021]

    "This course (and Applied Statistics) were among the most useful and insightful courses I have taken in my academic life."
    [Petroula Karakosta, 2022]

    "I applaud the delivery with hands-on tutorial sessions, supported by overview lectures. The assessments excellently supported the learning with the initial project helping us get over the initial bump, and the group project showing us how to apply ML to our own interests. 5/5 stars!"
    [Alice Patig, Ph.D. student at DTU]


    Last updated 20th of June 2023 by Troels Petersen.