Applied Statistics - Week 6
Tuesday the 2nd - Friday the 5th of January 2024
The following is a description of what we will go through during this
week of the course. The chapter references and computer exercises are
considered read, understood, and solved by the beginning of the
following class, where I'll shortly go through the exercise
solution.
General notes, links, and comments:
A neat tree-based algorithm is XGBoost, described in
the nice XGBoost paper.
An alternative which is faster and roughly equally performant is
LightGBM.
Tuesday:
We will start the new year with an introduction to Machine Learning, which are non-linear methods for
classification and regression, typically based on algorithms such as Boosted Decision Trees (BDT)
and Neural Networks (NN).
While Fisher's Discriminant is a powerful (and transparent) tool, it is superseded by
the more performant Machine Learning (ML) algorithms, which this lecture and exercise
is meant to whet your appetite for. Note that Machine Learning is not part of the curriculum.
Reading:
No formal reading, but please consider these introductions to
Decision Trees and
Neural Nets.
Further inspiration can be found here: ML links.
Lecture(s):
Introduction to Machine Learning
Recording of Lecture video.
Computer Exercise(s):
MachineLearningExample.ipynb and
associated data sample: DataSet_ML.txt.
Finally, a link to an
online
course on Machine Learning (by Udacity).
Alternatively, the NBI
Applied Machine Learning
course runs in block 4 (Schedule C).
Friday:
We will spend both Monday and Tuesday on a larger exercise, which
illustrates the idea of separating data into catagories, and how to
measure and optimise the performance of this in real data
with all of its quirks and twists.
The data is from ATLAS
testbeam data at CERN and deals with separating particles in a beam into
electrons and pions, but could in principle be from any other area of research
and/or business.
Reading:
No reading - focus on ATLAS test beam data analysis.
Lecture(s):
Stratification - beating the sqrt(N) law!
Real data analysis - ATLAS testbeam
Recording of Lecture video.
Computer Exercise(s):
The exercise is on the real ATLAS testbeam data (PDFs unknown!), where the use of three independent detectors is key.
Analysis of ATLAS testbeam data:
ATLAStestbeam.ipynb along with
main data (2 GeV) and
alternative data (9 GeV).
Last updated: 29th of December 2023.