Applied Statistics - Week 7

Monday the 6th - Friday the 10th of January 2020

The following is a description of what we will go through during this week of the course. The chapter references and computer exercises are considered read, understood, and solved by the beginning of the following class, where I'll shortly go through the exercise solution.

General notes, links, and comments:
  • A neat tree-based algorithm is XGBoost, which is described in the XGBoost paper.

    Monday:
    While Fisher's Discriminant is a powerful (and transparent) tool, it is superseded by the more performant Machine Learning (ML) algorithms, which this lecture and exercise is meant to wet your appetite for. Note that Machine Learning is not part of the curriculum.

    Reading:
  • No formal reading.
    Lecture(s):
  • AS2019_01_06_MultiVariateAnalysis2.pdf
    Computer Exercise(s):
  • MachineLearningExample.py and associated data sample: DataSet_ML.txt.
  • As a courtesy, here is code doing a ROC curve calculation: MakeROCfigure.ipynb along with an interactive version MakeROCfigure_animation.ipynb.

    Tuesday:
    We will spend both Tuesday and Friday on a larger exercise, which illustrates the idea of separating data into catagories, and how to measure and optimise the performance of this. The data is from ATLAS testbeam data at CERN and deals with separating particles in a beam into electrons and pions, but could in principle be from any other area of research and/or business. The subject is slightly associated with Bayes' Theorem, as there are not an equal amount of each type of particle in the beam. Many of you know this theorem already, but with this exercise I'll try to bring a general perspective on data analysis along with it.

    Reading:
  • Barlow, chapter 7.
  • Possibly (review of) Bayes' Theorem
    Lecture(s):
  • Test beam data introduction
  • I will give an introduction to todays exercise on ATLAS testbeam data.
    Computer Exercise(s):
  • Analysis of ATLAS testbeam data: ATLAStestbeam.ipynb along with main data (2 GeV) and alternative data (9 GeV).


    Friday:
    Reading:
  • No reading - focus on ATLAS test beam data analysis.
    Lecture(s):
  • I will go through the problem set and its solution.
  • Problem Set - Solutions and Comments.
    Computer Exercise(s):
  • We will continue with the ATLAS testbeam exercise, getting some key figures.
  • Analysis of ATLAS testbeam data: ATLAStestbeam.ipynb along with main data (2 GeV) and alternative data (9 GeV).


    Finally, a link to an online course on Machine Learning (by Udacity), which I was recommended.
    Alternatively, our own Big Data Analysis / Applied Machine Learning course runs in block 4 (Schedule C).
    Last updated: 4th of January 2020.