Applied Statistics - Week 6
Monday the 4th - Friday the 8th of January 2021
The following is a description of what we will go through during this
week of the course. The chapter references and computer exercises are
considered read, understood, and solved by the beginning of the
following class, where I'll shortly go through the exercise
solution.
General notes, links, and comments:
Lady tasting tea (Wikipedia).
Short note on Lady tasting tea.
A neat tree-based algorithm is XGBoost, which is described in
the XGBoost paper.
>
Monday:
While Fisher's Discriminant is a powerful (and transparent) tool, it
is superseded by the more performant Machine Learning (ML) algorithms,
which this lecture and exercise is meant to wet your appetite for.
Note that Machine Learning is not part of the curriculum.
Reading:
No formal reading, but please consider these introductions to
Decision Trees and
Neural Nets.
Further inspiration can be found here: ML links.
Lecture(s):
MultiVariate Analysis - Part II
Zoom: Link to lecture.
              Link to exercises.
Recording of Lecture video,
Lecture audio, and
Lecture chat.
Computer Exercise(s):
MachineLearningExample.ipynb and
associated data sample: DataSet_ML.txt.
Illustration of DecisionTree_InteractiveExample.ipynb
through an interactive example.
As a courtesy, here is code for producing a ROC curve: MakeROCfigure.ipynb
Illustration/Animation of ROC curve (requires additional packages): MakeROCfigure_animation.ipynb
Tuesday:
We will spend both Tuesday and Friday on a larger exercise, which
illustrates the idea of separating data into catagories, and how to
measure and optimise the performance of this.
The data is from ATLAS
testbeam data at CERN and deals with separating
particles in a beam into electrons and pions, but could in principle
be from any other area of research and/or business.
The subject is associated with Bayes' Theorem, as there are not
an equal amount of each type of particle in the beam.
Many of you know this theorem already, but with this exercise I'll
try to bring a general perspective on data analysis along with
it.
Reading:
Barlow, chapter 7.
Possibly (review of) Bayes' Theorem
Lecture(s):
Test beam data introduction
I will give an introduction to todays exercise on
ATLAS testbeam data.
Zoom: Link to lecture.
              Link to exercises.
Recording of Lecture video,
Lecture audio, and
Lecture chat.
Computer Exercise(s):
Analysis of ATLAS testbeam data:
ATLAStestbeam.ipynb along with
main data (2 GeV) and
alternative data (9 GeV).
Friday:
Reading:
No reading - focus on ATLAS test beam data analysis.
Lecture(s):
Notes on binning
MultiVariate Analysis3 - Part III
Zoom: Link to lecture.
              Link to exercises.
Recording of Lecture video,
Lecture audio, and
Lecture chat.
Computer Exercise(s):
We will continue with the ATLAS testbeam exercise, getting some key figures.
Analysis of ATLAS testbeam data:
ATLAStestbeam.ipynb along with
main data (2 GeV) and
alternative data (9 GeV).
Finally, a link to an
online
course on Machine Learning (by Udacity).
Alternatively, our own
Applied Machine Learning
course runs in block 4 (Schedule C).
Last updated: 6th of January 2021.