Applied Statistics - Week 7
Monday the 6th - Friday the 10th of January 2020
The following is a description of what we will go through during this
week of the course. The chapter references and computer exercises are
considered read, understood, and solved by the beginning of the
following class, where I'll shortly go through the exercise
solution.
General notes, links, and comments:
   A neat tree-based algorithm is XGBoost, which is described in
    the XGBoost paper.
Monday:
While Fisher's Discriminant is a powerful (and transparent) tool, it
is superseded by the more performant Machine Learning (ML) algorithms,
which this lecture and exercise is meant to wet your appetite for.
Note that Machine Learning is not part of the curriculum.
Reading:
   No formal reading.
Lecture(s):
  AS2019_01_06_MultiVariateAnalysis2.pdf
Computer Exercise(s):
  MachineLearningExample.py and
      associated data sample: DataSet_ML.txt.
  
   As a courtesy, here is code doing a ROC curve calculation:
    MakeROCfigure.ipynb along with an interactive version
    MakeROCfigure_animation.ipynb.
Tuesday:
We will spend both Tuesday and Friday on a larger exercise, which
illustrates the idea of separating data into catagories, and how to
measure and optimise the performance of this.
The data is from ATLAS
testbeam data at CERN and deals with separating
particles in a beam into electrons and pions, but could in principle
be from any other area of research and/or business.
The subject is slightly associated with Bayes' Theorem, as
there are not an equal amount of each type of particle in the beam.
Many of you know this theorem already, but with this exercise I'll
try to bring a general perspective on data analysis along with
it.
Reading:
   Barlow, chapter 7.
   Possibly (review of) Bayes' Theorem
Lecture(s):
   Test beam data introduction
   I will give an introduction to todays exercise on
    ATLAS testbeam data.
Computer Exercise(s):
  Analysis of ATLAS testbeam data:
    ATLAStestbeam.ipynb along with
    main data (2 GeV) and
    alternative data (9 GeV).
Friday:
Reading:
   No reading - focus on ATLAS test beam data analysis.
Lecture(s):
   I will go through the problem set and its solution.
   Problem Set - Solutions and Comments.
Computer Exercise(s):
   We will continue with the ATLAS testbeam exercise, getting some key figures.
   Analysis of ATLAS testbeam data:
    ATLAStestbeam.ipynb along with
    main data (2 GeV) and
    alternative data (9 GeV).
Finally, a link to an 
online
course on Machine Learning (by Udacity), which I was recommended.
Alternatively, our own 
Big Data Analysis / Applied Machine Learning
course runs in block 4 (Schedule C).
Last updated: 4th of January 2020.