Applied Statistics - Week 4

Monday the 9th - Friday the 13th of December 2019

The following is a description of what we will go through during this week of the course. The chapter references and computer exercises are considered read, understood, and solved by the beginning of the following class, where I'll shortly go through the exercise solution.

General notes, links, and comments:
  • Louis Lyons discussing discovery levels:: 1310.1284v1_LouisLyons_Why5sigma.pdf
  • Comparison between different tests for normality:: Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirn.pdf
  • Illustration of ROC curves: ROCcurves_GaussianSeparations.pdf
  • Comment on multiple hypothesis testing p-values:: p-value histogram
  • Paper of George Marsaglia on testing random numbers:: Random Number Generators

    Monday:
    The main theme of this week will be Hypothesis testing, and we will start with an exercise gently introducing the subject. In addition to the ChiSquare test, there are several other tests, some simple (one/two sample tests) and some more conceptually challenging (Kolmogorov and Wald-Wolfowitz runs test).

    Reading:
  • Barlow, chapter 8 on hypothesis testing (in particular 8.1-8.3).
  • Cohen, chapter 4 on hypothesis testing (perhaps omitting 4.2-4.4).
    Lecture(s):
  • Hypothesis Testing
  • On p-value histograms
  • Coincidences
    Computer Exercise(s):
  • Hypothesis testing: HypothesisTests_original.ipynb
  • Hypothesis testing ("Empty code" version): HypothesisTests_EmptyVersion_original.ipynb
  • Producing a ROC curve: MakeROCfigure.ipynb

    Tuesday:
    Today's lecture will be on Confidence Intervals, which in principle is a simple subject (and we will not go beyond simple here), but one with complicated details. For once, we will not have a matching exercise associated (as it is fairly general).
    I will re-iterate on hypothesis tests, and the exercise of the day will focus exactly on different tests for your own random (?) data.

    Reading:
  • Barlow, chapter 7.2
    Lecture(s):
  • Confidence Intervals And Limits
    Computer Exercise(s):
  • Random Digits Test: RandomDigitsTest_original.ipynb,
  • data_RandomDigits2019_A.txt,
  • data_RandomDigits2019_B.txt,
  • data_RandomDigits2019_C.txt,
  • data_RandomDigits2019_D.txt,
  • data_RandomDigits2019_E.txt,
  • data_RandomDigits2019_F.txt,
  • data_RandomDigits2019_G.txt, and
  • data_RandomDigits2019_H.txt
    For a large scale test, try one million digits of pi: pi1000000.txt
    In order to see, if you can test individuals ability to produce randon numbers, consider this data file (from 2017 - just to keep people anonymous): PersonsDigitsForTest2017.txt

    Friday:
    In the lecture, we will mainly focus on discussion of the TableMeasurement (in Aud. A), which covers both the philosophy of data handling and analysis, and actually also the construction of fits.
    In the exercises, we'll try a simple example of doing integration in many dimensions using simple simulation. First, it is the estimate of pi, followed by the rational numbers in front of (hyper) volumes of balls in many dimensions!
    In addition, there is a small exercise on how to fit discontinuities in data, in this case the length of the NBI HEP Christmas vacation break based on data from the coffee machine!

    Reading:
  • No reading - logic and reason suffices (along with math and Python!).
    Lecture(s):
  • Table Measurement Solution/Discussion
  • Testing random numbers
    Computer Exercise(s):
  • Estimating pi and hypersphere size from simulation: PiEstimate_original.ipynb
  • NBI Coffee Usage problem: CoffeeUsage_original.ipynb
  • NBI Coffee Usage problem ("Empty code" version): CoffeeUsage_EmptyVersion_original.ipynb
    Last updated: 6th of December 2019.