Applied Statistics - Week 6

Monday the 2nd - Friday the 6th of January 2023

The following is a description of what we will go through during this week of the course. The chapter references and computer exercises are considered read, understood, and solved by the beginning of the following class, where I'll shortly go through the exercise solution.

General notes, links, and comments:
  • A neat tree-based algorithm is XGBoost, described in the nice XGBoost paper.
  • An alternative which is faster and roughly equally performant is LightGBM.

    Monday:
    We will use the first week of the new year to consider the theme MultiVariate Analysis (MVA), that is analysis of data with more than one (typically many) variables. To begin with, we will consider the relatively simple linear case, which is described by (Fisher's) Linear Discriminant Analysis (LDA), and then move on to more complex sets of data.
    However, we will start with an introduction to Machine Learning, which are non-linear methods for classification and regression, typically based on algorithms such as Boosted Decision Trees (BDT) and Neural Networks (NN).

    While Fisher's Discriminant is a powerful (and transparent) tool, it is superseded by the more performant Machine Learning (ML) algorithms, which this lecture and exercise is meant to wet your appetite for. Note that Machine Learning is not part of the curriculum.

    Reading:
  • No formal reading, but please consider these introductions to Decision Trees and Neural Nets.
  • Further inspiration can be found here: ML links.
    Lecture(s):
  • Introduction to Machine Learning
  • A few extra things on Machine Learning (if time allows!)
    Zoom:
  • Link to lecture.
    Recording of Lecture video.
    Computer Exercise(s):
  • MachineLearningExample.ipynb and associated data sample: DataSet_ML.txt.
  • Illustration of DecisionTree_InteractiveExample.ipynb through an interactive example.

    Finally, a link to an online course on Machine Learning (by Udacity).
    Alternatively, the NBI Applied Machine Learning course runs in block 4 (Schedule C).

    Tuesday:
    Reading:
  • Barlow, chapter 7
  • An additional possible source is Fisher’s Linear Discriminant: Intuitively Explained Lecture(s):
  • Linear MultiVariate Analysis
    Zoom:
  • Link to lecture.
    Recording of Lecture video.
    Computer Exercise(s):
  • 2par_discriminant.ipynb
  • fisher_discriminant.ipynb and Fisher's Iris data.

    Friday:
    Reading:
  • No reading - focus on ATLAS test beam data analysis.
    Lecture(s):
  • Real data analysis - ATLAS testbeam
    Zoom:
  • Link to lecture.
    Recording of Lecture video.
    Computer Exercise(s):
  • The exercise is on the real ATLAS testbeam data (PDFs unknown!), where the use of three independent detectors is key.
  • Analysis of ATLAS testbeam data: ATLAStestbeam.ipynb along with main data (2 GeV) and alternative data (9 GeV).

    Last updated: 30th of December 2022.