Applied Machine Learning 2021

"Big Data is like teenage sex... everyone talks about it, nobody really knows how to do it, everyone else is doing it, so everyone claims they are doing it!"
[Dan Ariely, Professor at Duke University]
Troels C. Petersen Adriano Agnello Carl-Johannes Johnsen Zoe Ansari Vadim Rusakov Rasmus F. Oersoe
Lecturer - Associate Professor Lecturer - Assistant Professor Teaching assistent - Ph.D. Teaching assistent - Ph.D. Teaching assistent - Ph.D. Teaching assistent - Master
NBI - High Energy Physics NBI - Cosmology NBI - Computing NBI - Cosmology NBI - Cosmology NBI - High Energy Physics
35 52 54 42 / 26 28 37 39 35 33 76 41 31 44 42 56 81 92 22 88 50 26 94 50 40 51 52 31
petersennbi.dk adriano.agnellonbi.ku.dk cjjohnsennbi.ku.dk zakieh.ansarinbi.ku.dk vadim.rusakovnbi.ku.dk pcs557alumni.ku.dk


What, when, where, prerequisites, books, curriculum and evaluation:
Content: Graduate course on Machine Learning and application/project in science (7.5 ECTS).
Level: Intended for students at graduate level (4th--5th year) and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python).
When: Mondays 13-17 and Wednesdays 9-17 (Week Schedule Group C) in Block 4 (26/04-25/06 2021).
Where: To begin with, online only. We will then see, how things develop.
Format: Shorter lectures followed by computer exercises and discussion with emphasis on experience and projects.
Text book: References to Elements of Statistical Learning II.
Additional literature: We (i.e. you) will make extensive use of online ML resources, collected for this course.
Programming: Primarily Python 3.6+ with a few packages on top, though this is an individual choice.
Code repository: AppliedML2021 GitHub respository.
Communication: Messages through Absalon, lectures and exercises given live via Zoom.
Collaborative tools: We have made a course Slack channel: NbiAppliedML2021.slack.com, but you're of course welcome to use "anything" at will.
Exam: Final project (possibly virtual) presentations on Wednesday the 16th and Thursday the 17th of June all day (9:00-17:00+).
Evaluation: Small project (40%), and final project (60%), evaluated by lecturers following the Danish 7-step scale.

Further course information can be found here: ML2021_CourseInformation.pdf
A (highly recommended) questionnaire for the course will be used for everyone to facilitate student collaboration and group work.
To test your "Python & Packages", you can try out ML_MethodsDemos.ipynb, which is also meant to whet your appetite.

Course exam:
The following is the (assumed) final (V3.0) presentation schedule and guidelines.
Here you find evaluation froms (1-10 scale) for Wednesday presentations and Thursday presentations, respectively.
Here are Zoom links for Wednesday morning session and Wednesday afternoon session (NOT recorded!).
Here are Zoom links for Thursday morning session and Thursday afternoon session (NOT recorded!).





Course outline:
Below is the preliminary course outline, subject to changes throughout the course.

Week 1 (Introduction to Machine Learning concepts and methods):
Apr 26: 13:15-17:00: Intro to course, outline, groups, and discussion of data and goals (TP, AA, VR, ZA, CJ, and RO). Overview of Machine Learning techniques (TP).
     Exercise: Setup of infrastructure (Github, ERDA, Zoom, Slack). Inspecting data and making "human" decision tree.
Apr 28: 9:15-12:00: Introduction to Tree-based algorithms (TP).
     Exercise: Classification of b-quark jets in Aleph data with Tree based methods.
Apr 28: 13:15-17:00: Introduction to NeuralNet-based algorithms (TP).
     Exercise: Classification of b-quark jets in Aleph data with Neural Net based methods.

Week 2 (Data collection, reduction, training, and Hyper Parameter optimisation):
May 3: 13:15-17:00: Data collection, preprocessing, and dimensionality reduction (AA).
     Exercise: Run a (k)PCA on (a) the b-quark data table, and/or (b) the SDSS data table.
May 5: 9:15-12:00: Training, Validation, Test, Cross Validation, and introduction to basic machinery (AA).
     Exercise: Try to apply cross validation in your training.
May 5: 13:15-17:00: Hyperparameters, Overtraining, and Early stopping (Christian Michelsen, AA).
     Exercise: Hyperparameter optimisation of simple tree and NN algorithms.

Week 3 (Clustering and Feature Importance):
May 10: 13:15-17:00: Introduction to Clustering and Nearest Neighbor algorithms (VR). Small project's start (TP).
     Exercise: Try to apply the k-NN (and other) algorithms to e.g. breast cancer and/or the Aleph b-jet data.
May 12: 9:15-12:00: Feature Importance calculated using permutations and Shapley values (TP).
     Exercise: Determine the ranking of the input features for our "known" datasets, and cross check these with actual models.
May 12: 13:15-17:00: Population Mixture Models (AA).
     Exercise: Apply the Expectation-Maximization algorithm to cluster data of your choice.

Week 4 (Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs)):
May 17: 13:15-17:00: Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), and The ImageNet Competition (TP). Also Final projects kickoff.
     Exercise: Predict next entries in a sinus (periodic) and Mackay (non-periodic) sequence, and coordination/group discussion of final project.
May 19: 9:15-12:00: Convolutional Neural Networks (CNNs) and image analysis (Alexandar Topic).
     Exercise: Recognize images (in this case handwritten numbers) with Convolutional Neural Networks.
May 19: 13:15-17:00: Graph Neural Networks (GNNs) - analysing geometric data (Rasmus Oersoe).
     Exercise: Work on Small project.

Week 5 (Computing and scaling, Echo State Networks (ESNs), and ML on Supernova dust):
May 24: 13:15-17:00: No teaching (Whit Monday). Small project should be submitted by 22:00!.
     Bonus (self) study (by video): Computer infrastructure, Networks, Scaling, and Speed (Brian Vinter).
May 26: 9:15-12:00: Echo State Networks and anomaly detection (James Avery).
     Exercise: Work on final project.
May 26: 13:15-17:00: Supernova dust detection with Machine Learning (Zoe Ansari).
     Exercise: Work on final project.

Week 6 (Generative Adversarial Networks (GANs), Ethics in ML, and CNNs at work):
May 31: 13:15-17:00: Generative Adversarial Networks (GANs) (TP).
     Exercise: Work on final project.
Jun 2: 9:15-12:00: Ethics and adversarial Machine Learning (TP and AA).
     Exercise: Work on final project.
Jun 2: 13:15-17:00: Using CNNs in beer quality check (Carl-Johannes Johnsen).
     Exercise: Work on final project.

Week 7 (GPU data analysis, t-SNE and UMAP algorithms, course evaluation, and results on Small Project):
Jun 7: 13:15-17:00: GPU accelerated data analysis - Rapids (Mads Ruben Kristensen, Nvidia - formerly NBI)
Jun 9: 9:15-12:00: T-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) (TP).
     Exercise: Work on final project.
Jun 9: 13:15-17:00: Course evalution. Results and Feedback on small project.
     Exercise: Work on final project.

Week 8 (EXAM: Presentations of final project):
The following is the (assumed) final (V3.0) presentation schedule and guidelines.
Jun 14: 13:15-17:00: Final project work.
Jun 16: 8:15-12:00: Presentations of final projects (TP, AA, VR, ZA, CJ, and RO).
Jun 16: 13:15-17:00: Presentations of final projects (continued).
Jun 17: 8:15-12:00: Presentations of final projects (TP, AA, VR, ZA, CJ, and RO).
Jun 17: 13:15-17:00: Presentations of final projects (for as long as needed!) (continued).


Final Projects/Exam 2021:
Below you can find the presentations of the final projects given on the 16th of June 2021:
  • FinalProject1_UlrikSoerenMichalaMarcusAmalie_IceCoreInsoluablesClassification.pdf
  • FinalProject2_NielsBjarne_CreditCardFraudDetection.pdf
  • FinalProject3_AnnaArnauChristopherNeusChrysoula_PowerConsumptionPredictions.pdf
  • FinalProject4_AlexAsgerDanielJohan_BrainTumorMRI.pdf
  • FinalProject5_MortenFrederikNiallLeonJonathanKristian_IdentifyingChemicalsOnMarsWithChemCam.pdf
  • FinalProject6_GeorgiaJunValeriyRebecca_PredictingSongPopularityOnSpotify.pdf
  • FinalProject7_Jonathan_IdentifyingRoadSigns.pdf
  • FinalProject8_DanielEmilKevinGustav_StockMarketAnalysis.pdf
  • FinalProject9_AnnaDanaElloiseHelene_IcebergClassification.pdf
  • FinalProject10_NickTroelsJakobEmil_InsuranceClaimClassification.pdf
  • FinalProject11_LarsLiamMartin_IceCoreInsoluablesClassification.pdf
  • FinalProject12_MarcMathiasRasmusSoeren_CoastalMappingGreenland.pdf
  • FinalProject13_BenatJacobJonasPedro_DeepLearningPhotonics.pdf
  • FinalProject14_AndreaJulian_DetectingCovid19FromChestRadiographs.pdf
  • FinalProject15_EliotSofusMads_ClassifyingMusicalGenresFromAudioSnippets.pdf
  • FinalProject16_MartinSimonJeppeKristineEmmaCamilla_IceCoreInsoluablesClassification.pdf
  • FinalProject17_TobiasKaare_BirdCallClassification.pdf
  • FinalProject18_AliciaAndreasTommaso_PredictingAirplaneWeightAndBalance.pdf
  • FinalProject19_EmilyKatharinaNgaYing_ComaClusterClassification.pdf
  • FinalProject20_BeatrizMarcoVittorioMoritzCarl_IdentifyingKeplerObjects.pdf
  • FinalProject21_MarieMartinKasper_QuickDrawWithGANs.pdf
  • FinalProject22_IoannisLineaKimiSamyMaja_ClassifyingDogBreeds.pdf
  • FinalProject23_KianKianTobiasMia-Louise_IceCoreInsoluablesClassification.pdf
  • FinalProject24_PatrickMartinRuben_WinningFinalPremierLeagueWithML.pdf



    Presentations from previous years:
    Below you can find the presentations of the final projects given on the 12th of June 2019:
  • Project1_BomberMan.pdf
  • Project2_BoneAge.pdf
  • Project3_SpectralAnalysis.pdf
  • Project4_StockMarketAnalysis.pdf
  • Project5_FindingWallyIn2DImages.pdf
  • Project6_StellarClassificationCNN.pdf
  • Project7_PredictingAgeGenderEthnicity.pdf
  • Project8_UFOSightingDataMining.pdf
  • Project9_ClassificationOfCatsVsDogs.pdf
  • Project10_MulticlassClassificationOfHearBeats.pdf
  • Project11_PredictingAbsorptionEnergies.pdf
  • Project12_PredictingSolarBatteryProperties.pdf
  • Project13_SkinLesionClassification.pdf

    Below you can find the presentations of the final projects given on 10th of June 2020:
  • FinalProject1_RasmusPeter.pdf
  • FinalProject2_MariaAndyEmilMads_WalmartKaggle.pdf
  • FinalProject3_HelenaKatjaSimonViktoria.pdf
  • FinalProject4_AnnSofieEmyMartaYanet_RetrievalOfSeaSurfaceTemperatures.pdf
  • FinalProject5_ChristopherJoakimNikolaj_PredictingTheCriticalTempOfSuperconductors.pdf
  • FinalProject6_MikkelMikkelAskeAnnaMoust_GNNonIceCubeData.pdf
  • FinalProject7_HaiderRasmusMS_PredictingMusicPublicationYear.pdf
  • FinalProject8_AlbaMirenEdwinFynn_PredictingBloodCellType.pdf
  • FinalProject9_RuniSimoneMarcusJonathan_CalibrationForNewAstroDataForExoPlanetResearch.pdf
  • FinalProject10_SofusKristofferDavidElias_TrickingFaceTracking.pdf
  • FinalProject11_DinaAlineAlbertMichael_WheatDetection.pdf
  • FinalProject12_LaurentOrestisGiorgosCarlos_TweetSentimentExtraction.pdf
  • FinalProject13_SvendJulius_PredictingMusicGenre.pdf
  • FinalProject14_EmilMartiny_NoisyDataOnCells.pdf
  • FinalProject15_NicolasPedersen_IdentificationOfObjectsIn2DImages.pdf

    "Some people worry that artificial intelligence will make us feel inferior, but then, anybody in his right mind should have an inferiority complex every time he looks at a flower." [Alan Kay, American computer scientist]


    Last updated 14th of May 2021 by Troels Petersen.