Applied Machine Learning and Big Data (Block 4, 2019)

"As one Google Translate engineer put it: 'when you go from 10,000 training examples to 10 billion training examples, it all starts to work.
Data trumps everything.'"

[Garry Kasparov, Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins]
Troels C. Petersen Brian Vinter Joachim Mathiesen Adriano Agnello
Lecturer - Associate Professor Lecturer - Professor Lecturer - Associate Professor Teacher - Associate Professor
NBI - High Energy Physics NBI - Computing NBI - Biocomplexity NBI - Cosmology
35 52 54 42 / 26 28 37 39 35 32 14 21 35 32 52 14 35 33 76 41
petersennbi.dk vinternbi.dk mathiesnbi.dk adriano.agnellonbi.dk


What, when, where, prerequisites, books, curriculum and evaluation:
Content: Graduate course on Machine Learning and Big Data usage in science.
Level: Intended for students at graduate level (4th--5th year) and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python).
When: Mondays 13-17 and Wednesdays 8-17 (Week Schedule Group C) in Block 4 (24/04-12/06 2019).
Where: Lectures: Auditorium 1 at AKB. Exercises: Room A110 at HCO
Format: Shorter lectures followed by computer exercises and discussion with emphasis on experience and projects.
Text book: Selected parts of Elements of Statistical Learning II.
Additional literature: A short and good introduction can be found in Part 1 of Deep Learning.
Christopher M. Bishop: "Pattern Recognitio and Machine Learning".
Language: English (occational Danish utterings!). All exercises, problem sets, notes, etc. are in English.
Evaluation: Small project (20%), and final project (80%), evaluated by lecturers following the Danish 7-step scale.
Credits: 7.5 ECTS (1/8 academic years work, that is 187.5-225 hours of work, thus about 23-28 hours weekly).

Further course information can be found here: ML2019_CourseInformation.pdf



Course outline:
Below is the preliminary course outline, subject to changes throughout the course.

Week 1 (Introduction, General Concepts, Data):
Apr 24: 9:15-12:00: Intro to course, photos, questionnaire, and discussion of data and goals (TP, Lecture: AKB Aud 3).
Apr 24: 13:15-17:00: Playing with multi dimensional data (TP). Overview of Machine Learning techniques (TP).
     Slides: ML2019_Example_HousingPrices.pdf (2.8 MB), ML2019_BigDataTop10.pdf (100 kB), and ML2019_PCA_Adriano.pdf (100 kB).
     Data: READMEAlephBtagData.txt (2 kB), AlephBtag_MC_small_v0.csv (2.8 MB), and AlephBtag_MC_small_v2.csv (2.8 MB).
     Code: An example solution with plotting can be found here: BjetSelection.py (11 kB).

Week 2 (Overview of Machine Learning, Pro's and Con's):
Apr 29: 13:15-17:00: Dimensionality reduction and introduction to basic machinery (JM, Lecture: AKB Aud 1).
     Slides: NBI_PCA2019.pdf (2.3 MB) and ppt-tsne_alex.pdf (11.0 MB).
     Code and exercise note: BDA_2019_PCA.py (1.7 kB), and BDA_COURSE_2019.pdf (97 kB).
May 1: International workers day, and thus no teaching!

Week 3 (Decision Trees, Nearest Neighbor techniques, Neural Networks, Support Vector Machines):
May 6: 13:15-17:00: Decision Trees (TP). Small projects start (Lecture: Store UP1).
     Slides: ML2019_IntroToMultiVariateAnalysis.pdf (3.9 MB) and ML2019_BDTs_RandomForests_XGboost.pdf (6.0 MB).
     Data and code for small project: train.h5 (86 MB), test.h5 (84 MB), and ReadingData.py (8 kB).
     Links to XGBoost paper and discussion of XGBoost vs. LightGBM
May 8: 9:15-12:00: Nearest Neighbor algorithms (BV, Lecture: AKB Aud 3).
May 8: 13:15-17:00: Neural Networks (TP+AA).
     Slides: KNN-Talk.pdf (4.3 MB) and ML2019_NeuralNetworks_DeepLearning.pdf (3.9 MB).
    
Week 4 (Computers and Infrastructure, Kernal Machines):
May 13: 13:15-17:00: Infrasturcture: Computers, storage, and networks (BV, Lecture: Store UP1).
May 15: 9:15-12:00: Hyperparameter optimisation (see link at bottom). Small project work (Lecture: AKB Aud 3). Final projects introduction.
May 15: 13:15-17:00: Kernel Machines and their use (JM).

Week 5 (Computing and scaling, Hyperparameters):
May 20: 13:15-17:00: Computing, scaling, and GPUs (BV, Lecture: AKB Aud 1). Small project should be submitted!.
May 22: 9:15-12:00: Final project work (AA and ??, Lecture: AKB Aud 3).
May 22: 13:15-17:00: Population mixture models (AA). Final project work.

Week 6 (Unstructured data):
May 27: 13:15-17:00: Convolutional Neural Networks and Unstructured data (JM, Lecture: Store UP1).
May 29: 9:15-12:00: Recurrent and Long Short-Term Memory Neural Networks (James Avery, Lecture: AKB Aud 3). Final project work.
May 29: 13:15-17:00: Introduction to Generative (Adversarial) Networks (TBC). Final project work (TP).

Week 7 (Final project work):
Jun 3: 13:15-17:00: Final project work (TP, Lecture: Store UP1).
Jun 5: Constitution Day (DK: Grundlovsdag) and no teaching.

Week 8 (Presentations of final projects):
Jun 10: 13:15-17:00: No teaching (Whit Monday).
Jun 12: 9:15-12:00: Presentations of final projects (TP, JM, AA, BV, Lecture: AKB Aud 3).
Jun 12: 13:15-17:00: Presentations of final projects (cont.) and course evaluation.
     Here you can see the schedule for the day. And here you find the evaluation form for all the projects (1-10 scale).

Below you can find the presentations of the final projects given the 12th of June:
  • Project1_BomberMan.pdf
  • Project2_BoneAge.pdf
  • Project3_SpectralAnalysis.pdf
  • Project4_StockMarketAnalysis.pdf
  • Project5_FindingWallyIn2DImages.pdf
  • Project6_StellarClassificationCNN.pdf
  • Project7_PredictingAgeGenderEthnicity.pdf
  • Project8_UFOSightingDataMining.pdf
  • Project9_ClassificationOfCatsVsDogs.pdf
  • Project10_MulticlassClassificationOfHearBeats.pdf
  • Project11_PredictingAbsorptionEnergies.pdf
  • Project12_PredictingSolarBatteryProperties.pdf
  • Project13_SkinLesionClassification.pdf

    "Some people worry that artificial intelligence will make us feel inferior, but then, anybody in his right mind should have an inferiority complex every time he looks at a flower." [Alan Kay, American computer scientist]


    Notes and links:
    In addition to the text book and other literature, some notes may be useful during the course:
  • Online course introducing Machine Learning..

    Last updated 12th of March 2019.