Big Data Analysis - Week 1

Monday the 20th - Friday the 24th of April 2020

Monday 20th of April (afternoon):
Lectures: Intro to course, outline, groups, and discussion of data and goals (TP, AA, BV, ZA, CJ). Overview of Machine Learning techniques (TP).
     Additional slides: ML2020_Example_HousingPrices.pdf (2.8 MB), ML2020_BigDataTop10.pdf (100 kB), and ML2020_WhoAreYou.pdf (2.1 MB).

Zoom:Link to lecture.
     Recording in Lecture video (418 MB) and Lecture audio (30 MB) along with Lecture chat (3 kB)).

Exercise: Setup of infrastructure (Github, ERDA, Zoom, Slack).      Inspecting data and making a "human" decision tree for classification of b-quark jets in Aleph data.
     Data: READMEAlephBtagData.txt (2 kB), AlephBtag_MC_small_v0.csv (2.8 MB), and AlephBtag_MC_small_v2.csv (2.8 MB).
     An example solution with plotting but POOR result can be found here: BjetSelection_PoorSelection.py.py (11 kB).
     An example solution (getting 11% wrong) in Jupyter Notebook can be found here: BjetSelection.ipynb (12 kB).


Wednesday 22nd of April (morning):
Lectures: Introduction to Tree-based algorithms (TP).

Zoom:Link to lecture.
     Recording in Lecture video (109 MB) and Lecture audio (26 MB) along with Lecture chat (3 kB).
     Recording in Exercise video (116 MB) and Exercise audio (27 MB) along with Exercise chat (16 kB).

Exercise: Exercise: Classification of b-quark jets in Aleph data with Tree based methods. Compare performance to your own Decision Tree.


Wednesday 22nd of April (afternoon):
Lectures: Introduction to NeuralNet-based algorithms (TP).

Zoom:Link to lecture.
     Recording in Lecture video (84 MB) and Lecture audio (22 MB) along with chat (10 kB).
     Recording in Exercise video (103 MB) and Exercise audio (17 MB).

Exercise: Exercise: Classification of b-quark jets in Aleph data with Neural Net based methods.
     Compare performance to your tree based method(s).


Example solutions from week 1:
The following are example solutions and related code, which comes with absolutely no warrenty, that you may let yourself be inspired by:
  • BjetClassification_TreeBased.py, which is an commented and illustrative AdaBoostClassifier from SciKit-Learn (from Troels).
  • Sklearn_tree_setup.py, which is a general setup (i.e. not running algorithm) of SciKit-Learn classifiers (from Rasmus).
  • NN-classification+2020.ipynb, which is a Neural Network model from Keras Tensorflow (from Zoe).
  • TreeandNN.py, which is both a tree based (LightGBM) and NN based (SciKit-Learn) classifier (from Haider).
    Last updated: 22nd of April 2020 by Troels Petersen.