Applied Machine Learning 2022

"Big Data is like teenage sex... everyone talks about it, nobody really knows how to do it, everyone else is doing it, so everyone claims they are doing it!"
[Dan Ariely, Professor at Duke University]

Troels C. Petersen Charles L. Steinhardt Julius B. Kirkegaard Vadim Rusakov Rajeeb Sharma Azzurra d'Alessandro
Lecturer - Associate Professor Teacher - Associate Professor Teacher - Assistant Professor Teaching assistent - Ph.D. Teaching assistent - Ph.D. Teaching assistent - Ph.D.
NBI - High Energy Physics NBI - Cosmology NBI - Biocomplexity NBI - Cosmology NBI - Astrophysics NBI - Astrophysics
35 52 54 42 / 26 28 37 39 27 51 44 47 29 72 92 57 50 26 94 50 71 40 24 15 42 45 39 54
petersennbi.dk steinhardtnbi.ku.dk julius.kirkegaardnbi.ku.dk vadim.rusakovnbi.ku.dk rajeeb.sharmanbi.ku.dk azzurra.dalessandronbi.ku.dk


What, when, where, prerequisites, books, curriculum and evaluation:
Content: Graduate course on Machine Learning and application/project in science (7.5 ECTS).
Level: Intended for students at graduate level (4th--5th year) and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python).
When: Mondays 13-17 and Wednesdays 9-17 (Week Schedule Group C) in Block 4 (25/04-24/06 2022).
Where (lectures): Mondays: Lille UP1 at DIKU (except weeks 19-21, then Aud3 at HCO), Wednesdays: Aud1 at HCO.
Where (exercises): Mondays: Kursussal 1, 3, and 4A at Zoological Museum.
Wednesdays: DIKU rooms 1-0-26, 1-0-30, and bib 4-0-17 (mornings) and Aud1 at HCO (afternoons), see KU Room Schedule plan.
Format: Shorter lectures followed by computer exercises and discussion with emphasis on experience and projects.
Text book: References to Elements of Statistical Learning II.
Suppl. literature: We (i.e. you) will make extensive use of online ML resources, collected for this course.
Programming: Primarily Python 3.8+ with a few packages on top, though this is an individual choice.
Code repository: All code we provide can be found in the AppliedML2022 GitHub respository.
Communication: All announcements will be given through Absalon. To reach me, Email is preferable.
Collaborative tools: For "short coding communication" we have made a course Slack channel: NbiAppliedML2022.slack.com.
Exam: Final project (possibly virtual) presentations on Wednesday the 15th and Thursday the 16th of June all day (9:00-17:00+).
Evaluation: Initial project (40%), and final project (60%), evaluated by lecturers following the Danish 7-step scale.

An introduction to the course can be gotten from this ML subject overview and related film introducing the subjects (23 min, 1.48 GB).

Further course information can be found here: ML2022_CourseInformation.pdf
A questionnaire for the course is used for better knowing who you are and optimising the course accordingly.
To test your "Python & Packages", you can try out ML_MethodsDemos.ipynb, which is also meant to whet your appetite.



Course outline:
Below is the preliminary course outline, subject to changes throughout the course.

Week 1 (Introduction to Machine Learning concepts and basic methods):
Apr 25: 13:15-17:00: Intro to course, outline, groups, and discussion of data and goals. Overview of Machine Learning techniques (TP).
     Exercise: Setup of infrastructure (Github, Slack). Inspecting data and making "human" decision tree and linear (Fisher) discriminant.
Apr 27: 9:15-12:00: Introduction to Tree-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Decision Tree based methods.
Apr 27: 13:15-17:00: Introduction to NeuralNet-based algorithms (TP).
     Exercise: Classification (and regression) on reference data sets with Neural Net based methods.

Week 2 (Loss function, training, validation, unsupervised learning, and preprocessing):
May 2: 13:15-17:00: Loss function, Training, Validation, Test, and Cross Validation (TP). Initial project start.
     Exercise: Try to alter your loss function and apply cross validation in your training.
May 4: 9:15-12:00: Introduction to Unsupervised Learning: Clustering and Nearest Neighbor algorithms (CS).
     Exercise: Try to apply the k-NN (and other) algorithms to reference data sets.
May 4: 13:15-17:00: Preprocessing and dimensionality reduction (CS).
     Exercise: Run a (k)PCA, k-Means, and possibly tSNE/UMAP algorithm on reference data sets.

Week 3 (Hyper Parameter optimisation, Feature Importance, Clustering, and Final Project):
May 9: 13:15-17:00: Hyperparameters, Overtraining, and Early stopping (TP).
     Exercise: Hyperparameter optimisation of simple tree and NN algorithms.
May 11: 9:15-12:00: Feature Importance calculated using permutations and Shapley values (TP).
     Exercise: Determine feature ranking for reference data sets, and cross check these with actual models.
May 11: 13:15-17:00: Final projects kickoff.
     Exercise: Clustering along with inspection of final project data and discussion of project goals.

Week 4 (Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Auto-Encoders (AE)):
May 16: 13:15-17:00: Convolutional Neural Networks (CNNs) and image analysis (JK).
     Exercise: Recognize images (insoluables from Greenland ice cores) with a CNN.
May 18: 9:15-12:00: Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM) and Natural Language Processing (NLP) (JK).
     Exercise: Use an LSTM to predict flight traffic and do Natural Language Processing on IMDB movie reviews.
May 18: 13:15-17:00: (Variational) Auto-Encoder and anomaly detection (TP).
     Exercise: Compress images using Auto-Encoder, and cluster latent space with UMAP.

Week 5 (Graph Neural Networks (GNNs), Generative Adversarial Networks (GANs), and GPU data analysis):
May 23: 13:15-17:00: Graph Neural Networks (GNNs) - analysing geometric data (Rasmus Oersoe/TP).
     Exercise: Work on Initial project. Initial project should be submitted by 22:00!.
May 25: 9:15-12:00: Variational AutoEncoders, Generative Adversarial Networks, and Reinforcement Learning (TP).
     Exercise: Work on final project.
May 25: 13:15-17:00: GPU accelerated data analysis - Rapids (Mads Ruben Kristensen, Nvidia - formerly NBI).
     Exercise: Work on final project.

Week 6 (CNNs at work and Scientific ML, and Ethics in ML):
May 30: 13:15-17:00: Example of CNN at work on beer, and Scientific ML.
     Exercise: Work on final project.
Jun 1: 9:15-12:00: Echo State Networks (ESN) and anomaly detection (James Avery).
     Exercise: Work on ESNs and/or final project.
Jun 1: 13:15-17:00: Ethics in the usage of Machine Learning (TP).
     Exercise: Work on final project.

Week 7 (Computing and scaling/speed (videos), Results on Initial Project, and Exam example/Q+A):
Jun 6: 13:15-17:00: No teaching (Whit Monday).
     Bonus (self) study (by video): Computer infrastructure, Networks, Scaling, and Speed (Brian Vinter).
Jun 8: 9:15-12:00: Results and Feedback on initial project.
     Exercise: Work on final project.
Jun 8: 13:15-17:00: Example exam presentation followed by Question+Answer session about the exam.
     Exercise: Work on final project.

Week 8 (Echo State Networks (ESNs) and EXAM: Presentations of final project):
Jun 13: 13:15-17:00: Echo State Networks and anomaly detection (James Avery).
     Exercise: Work on Echo State Networks and/or final project.
Jun 15: 8:15-12:00: Presentations of final projects (TP, CS, JK, VR, RS, DA, and potentially others!).
Jun 15: 13:15-17:00: Presentations of final projects (continued).
Jun 16: 8:15-12:00: Presentations of final projects (TP, CS, JK, VR, RS, DA, and potentially others!).
Jun 16: 13:15-17:00: Presentations of final projects (for as long as needed!) (continued).


Final Projects/Exam 2022:
An overview of final project groups, subjects, and exam schedule can be found here: ML2022_FinalProjectPresentationSchedule.pdf.
The analysis of the results leading to the grading is explained here: ML2022_FinalProject_CommentsOnGrading.pdf.




Presentations from previous years:
Below you can find the presentations of the final projects given on the 12th of June 2019:
  • Project1_BomberMan.pdf
  • Project2_BoneAge.pdf
  • Project3_SpectralAnalysis.pdf
  • Project4_StockMarketAnalysis.pdf
  • Project5_FindingWallyIn2DImages.pdf
  • Project6_StellarClassificationCNN.pdf
  • Project7_PredictingAgeGenderEthnicity.pdf
  • Project8_UFOSightingDataMining.pdf
  • Project9_ClassificationOfCatsVsDogs.pdf
  • Project10_MulticlassClassificationOfHearBeats.pdf
  • Project11_PredictingAbsorptionEnergies.pdf
  • Project12_PredictingSolarBatteryProperties.pdf
  • Project13_SkinLesionClassification.pdf

    Below you can find the presentations of the final projects given on 10th of June 2020:
  • FinalProject1_RasmusPeter.pdf
  • FinalProject2_MariaAndyEmilMads_WalmartKaggle.pdf
  • FinalProject3_HelenaKatjaSimonViktoria.pdf
  • FinalProject4_AnnSofieEmyMartaYanet_RetrievalOfSeaSurfaceTemperatures.pdf
  • FinalProject5_ChristopherJoakimNikolaj_PredictingTheCriticalTempOfSuperconductors.pdf
  • FinalProject6_MikkelMikkelAskeAnnaMoust_GNNonIceCubeData.pdf
  • FinalProject7_HaiderRasmusMS_PredictingMusicPublicationYear.pdf
  • FinalProject8_AlbaMirenEdwinFynn_PredictingBloodCellType.pdf
  • FinalProject9_RuniSimoneMarcusJonathan_CalibrationForNewAstroDataForExoPlanetResearch.pdf
  • FinalProject10_SofusKristofferDavidElias_TrickingFaceTracking.pdf
  • FinalProject11_DinaAlineAlbertMichael_WheatDetection.pdf
  • FinalProject12_LaurentOrestisGiorgosCarlos_TweetSentimentExtraction.pdf
  • FinalProject13_SvendJulius_PredictingMusicGenre.pdf
  • FinalProject14_EmilMartiny_NoisyDataOnCells.pdf
  • FinalProject15_NicolasPedersen_IdentificationOfObjectsIn2DImages.pdf

    Below you can find the presentations of the final projects given on the 16th+17th of June 2021:
  • FinalProject1_UlrikSoerenMichalaMarcusAmalie_IceCoreInsoluablesClassification.pdf
  • FinalProject2_NielsBjarne_CreditCardFraudDetection.pdf
  • FinalProject3_AnnaArnauChristopherNeusChrysoula_PowerConsumptionPredictions.pdf
  • FinalProject4_AlexAsgerDanielJohan_BrainTumorMRI.pdf
  • FinalProject5_MortenFrederikNiallLeonJonathanKristian_IdentifyingChemicalsOnMarsWithChemCam.pdf
  • FinalProject6_GeorgiaJunValeriyRebecca_PredictingSongPopularityOnSpotify.pdf
  • FinalProject7_Jonathan_IdentifyingRoadSigns.pdf
  • FinalProject8_DanielEmilKevinGustav_StockMarketAnalysis.pdf
  • FinalProject9_AnnaDanaElloiseHelene_IcebergClassification.pdf
  • FinalProject10_NickTroelsJakobEmil_InsuranceClaimClassification.pdf
  • FinalProject11_LarsLiamMartin_IceCoreInsoluablesClassification.pdf
  • FinalProject12_MarcMathiasRasmusSoeren_CoastalMappingGreenland.pdf
  • FinalProject13_BenatJacobJonasPedro_DeepLearningPhotonics.pdf
  • FinalProject14_AndreaJulian_DetectingCovid19FromChestRadiographs.pdf
  • FinalProject15_EliotSofusMads_ClassifyingMusicalGenresFromAudioSnippets.pdf
  • FinalProject16_MartinSimonJeppeKristineEmmaCamilla_IceCoreInsoluablesClassification.pdf
  • FinalProject17_TobiasKaare_BirdCallClassification.pdf
  • FinalProject18_AliciaAndreasTommaso_PredictingAirplaneWeightAndBalance.pdf
  • FinalProject19_EmilyKatharinaNgaYing_ComaClusterClassification.pdf
  • FinalProject20_BeatrizMarcoVittorioMoritzCarl_IdentifyingKeplerObjects.pdf
  • FinalProject21_MarieMartinKasper_QuickDrawWithGANs.pdf
  • FinalProject22_IoannisLineaKimiSamyMaja_ClassifyingDogBreeds.pdf
  • FinalProject23_KianKianTobiasMia-Louise_IceCoreInsoluablesClassification.pdf
  • FinalProject24_PatrickMartinRuben_WinningFinalPremierLeagueWithML.pdf

    Below you can find the presentations of the final projects given on the 15th+16th of June 2022:
  • Market Making
  • Predicting weather from weather station data
  • Characterising wetland ecosystems from satellite data
  • Zebra Call Type Classification And Clustering
  • Medical Data Analysis
  • Analysis of insoluble images from Peruvian Icecore I
  • Analysis of insoluble images from Peruvian Icecore II
  • IceCore MeltLayer Predictions using a CNN
  • Machine Learning on the OMXC25 Stocks
  • Predicting financial concerns
  • Wine and label project
  • Predicting Pokemons
  • Clustering Tweets
  • Analysis of insoluble images from Peruvian Icecore III
  • Arctic Sea Ice
  • Chess AI
  • NLP for sentence classification
  • NLP on genetics
  • Analysis of insoluble images from Peruvian Icecore IV
  • Classifying and clustering ovarian cancer from DNA sequences
  • CNN in playing game
  • Classification of Heavy Neutral Leptons

    "Some people worry that artificial intelligence will make us feel inferior, but then, anybody in his right mind should have an inferiority complex every time he looks at a flower." [Alan Kay, American computer scientist]


    Course comments/praise (very biased selection!):
    "Best day of my life!" (Pressumably at the University, red.)
    [Christian M. Clausen, on the day of final project presentations, 2019]

    "Student 1: Damn..."
    "Student 2: I was just thinking what a shame you didn't get to see a whole classroom worth of 'damn' faces! But the feeling is there."

    [Reaction in Zoom chat, after having explained the capabilities of Reinforcement Learning examplified by AlphaZero, 2020]
    [And I got to see the reaction the year before!]

    "Troels is the perfect shepherd guiding relatively inexperienced statisticians to machine learning in an approachable and fun way."
    [Anon, course evaluation, 2021]

    "This course (and Applied Statistics) were among the most useful and insightful courses I have taken in my academic life."
    [Petroula Karakosta, 2022]



    Last updated 26th of April 2022 by Troels Petersen.