Applied Statistics - Week 4
Monday the 9th - Friday the 13th of December 2019
The following is a description of what we will go through during this
week of the course. The chapter references and computer exercises are
considered read, understood, and solved by the beginning of the
following class, where I'll shortly go through the exercise
solution.
General notes, links, and comments:
   Louis Lyons discussing discovery levels::
    1310.1284v1_LouisLyons_Why5sigma.pdf
   Comparison between different tests for normality::
    Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirn.pdf
   Illustration of ROC curves: ROCcurves_GaussianSeparations.pdf
   Comment on multiple hypothesis testing p-values::
    p-value histogram
   Paper of George Marsaglia on testing random numbers::
    Random Number Generators
Monday:
The main theme of this week will be 
Hypothesis testing, and
we will start with an exercise gently introducing the subject.
In addition to the ChiSquare test, there are several other tests, some
simple (one/two sample tests) and some more conceptually challenging
(Kolmogorov and Wald-Wolfowitz runs test).
Reading:
  Barlow, chapter 8 on hypothesis testing (in particular 8.1-8.3).
  Cohen, chapter 4 on hypothesis testing (perhaps omitting 4.2-4.4).
Lecture(s):
  Hypothesis Testing
  On p-value histograms
  Coincidences
Computer Exercise(s):
  Hypothesis testing: HypothesisTests_original.ipynb
  Hypothesis testing ("Empty code" version): HypothesisTests_EmptyVersion_original.ipynb
  Producing a ROC curve: MakeROCfigure.ipynb
Tuesday:
Today's lecture will be on Confidence Intervals, which in
principle is a simple subject (and we will not go beyond simple here),
but one with complicated details. For once, we will not have a
matching exercise associated (as it is fairly general).
I will re-iterate on hypothesis tests, and the exercise of the day will
focus exactly on different tests for your own random (?) data.
Reading:
  Barlow, chapter 7.2
Lecture(s):
  Confidence Intervals And Limits
Computer Exercise(s):
  Random Digits Test:
    RandomDigitsTest_original.ipynb,
   data_RandomDigits2019_A.txt,
   data_RandomDigits2019_B.txt,
   data_RandomDigits2019_C.txt,
   data_RandomDigits2019_D.txt,
   data_RandomDigits2019_E.txt,
   data_RandomDigits2019_F.txt,
   data_RandomDigits2019_G.txt, and
   data_RandomDigits2019_H.txt
 
    For a large scale test, try one million digits of pi: pi1000000.txt
    In order to see, if you can test individuals ability to produce
    randon numbers, consider this data file (from 2017 - just to keep people anonymous):
    PersonsDigitsForTest2017.txt
Friday:
In the lecture, we will mainly focus on discussion of the
TableMeasurement (in Aud. A), which covers both the philosophy of data
handling and analysis, and actually also the construction of fits.
In the exercises, we'll try a simple example of doing integration in
many dimensions using simple simulation. First, it is the estimate of
pi, followed by the rational numbers in front of (hyper) volumes of
balls in many dimensions!
In addition, there is a small exercise on how to fit discontinuities in data,
in this case the length of the NBI HEP Christmas vacation break based on data
from the coffee machine!
Reading:
   No reading - logic and reason suffices (along with math and Python!).
Lecture(s):
  Table Measurement Solution/Discussion
  Testing random numbers
Computer Exercise(s):
   Estimating pi and hypersphere size from simulation:
    PiEstimate_original.ipynb
   NBI Coffee Usage problem:
    CoffeeUsage_original.ipynb
   NBI Coffee Usage problem ("Empty code" version):
    CoffeeUsage_EmptyVersion_original.ipynb
Last updated: 6th of December 2019.