Applied Statistics - Week 6

Monday the 4th - Friday the 8th of January 2021

The following is a description of what we will go through during this week of the course. The chapter references and computer exercises are considered read, understood, and solved by the beginning of the following class, where I'll shortly go through the exercise solution.

General notes, links, and comments:
  • Lady tasting tea (Wikipedia).
  • Short note on Lady tasting tea.
  • A neat tree-based algorithm is XGBoost, which is described in the XGBoost paper.

    Monday:
    While Fisher's Discriminant is a powerful (and transparent) tool, it is superseded by the more performant Machine Learning (ML) algorithms, which this lecture and exercise is meant to wet your appetite for. Note that Machine Learning is not part of the curriculum.

    Reading:
  • No formal reading, but please consider these introductions to Decision Trees and Neural Nets.
  • Further inspiration can be found here: ML links.
    Lecture(s):
  • MultiVariate Analysis - Part II
    Zoom: Link to lecture.
                  Link to exercises.
  • Recording of Lecture video, Lecture audio, and Lecture chat.
    Computer Exercise(s):
  • MachineLearningExample.ipynb and associated data sample: DataSet_ML.txt.
  • Illustration of DecisionTree_InteractiveExample.ipynb through an interactive example.
    As a courtesy, here is code for producing a ROC curve: MakeROCfigure.ipynb
    Illustration/Animation of ROC curve (requires additional packages): MakeROCfigure_animation.ipynb


    Tuesday:
    We will spend both Tuesday and Friday on a larger exercise, which illustrates the idea of separating data into catagories, and how to measure and optimise the performance of this. The data is from ATLAS testbeam data at CERN and deals with separating particles in a beam into electrons and pions, but could in principle be from any other area of research and/or business.
    The subject is associated with Bayes' Theorem, as there are not an equal amount of each type of particle in the beam. Many of you know this theorem already, but with this exercise I'll try to bring a general perspective on data analysis along with it.

    Reading:
  • Barlow, chapter 7.
  • Possibly (review of) Bayes' Theorem
    Lecture(s):
  • Test beam data introduction
  • I will give an introduction to todays exercise on ATLAS testbeam data.
    Zoom: Link to lecture.
                  Link to exercises.
  • Recording of Lecture video, Lecture audio, and Lecture chat.
    Computer Exercise(s):
  • Analysis of ATLAS testbeam data: ATLAStestbeam.ipynb along with main data (2 GeV) and alternative data (9 GeV).


    Friday:
    Reading:
  • No reading - focus on ATLAS test beam data analysis.
    Lecture(s):
  • Notes on binning
  • MultiVariate Analysis3 - Part III
    Zoom: Link to lecture.
                  Link to exercises.
  • Recording of Lecture video, Lecture audio, and Lecture chat.
    Computer Exercise(s):
  • We will continue with the ATLAS testbeam exercise, getting some key figures.
  • Analysis of ATLAS testbeam data: ATLAStestbeam.ipynb along with main data (2 GeV) and alternative data (9 GeV).


    Finally, a link to an online course on Machine Learning (by Udacity).
    Alternatively, our own Applied Machine Learning course runs in block 4 (Schedule C).
    Last updated: 6th of January 2021.