Applied Statistics - Week 4

Monday the 9th - Friday the 13th of December 2019

The following is a description of what we will go through during this week of the course. The chapter references and computer exercises are considered read, understood, and solved by the beginning of the following class, where I'll shortly go through the exercise solution.

General notes, links, and comments:

Louis Lyons discussing discovery levels:: 1310.1284v1_LouisLyons_Why5sigma.pdf

Comparison between different tests for normality:: Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirn.pdf

Illustration of ROC curves: ROCcurves_GaussianSeparations.pdf

Comment on multiple hypothesis testing p-values:: p-value histogram

Paper of George Marsaglia on testing random numbers:: Random Number Generators

Monday:
The main theme of this week will be Hypothesis testing, and we will start with an exercise gently introducing the subject. In addition to the ChiSquare test, there are several other tests, some simple (one/two sample tests) and some more conceptually challenging (Kolmogorov and Wald-Wolfowitz runs test).

Reading:

Barlow, chapter 8 on hypothesis testing (in particular 8.1-8.3).

Cohen, chapter 4 on hypothesis testing (perhaps omitting 4.2-4.4).
Lecture(s):

Hypothesis Testing

On p-value histograms

Coincidences
Computer Exercise(s):

Hypothesis testing: HypothesisTests_original.ipynb

Hypothesis testing ("Empty code" version): HypothesisTests_EmptyVersion_original.ipynb

Producing a ROC curve: MakeROCfigure.ipynb

Tuesday:
Today's lecture will be on Confidence Intervals, which in principle is a simple subject (and we will not go beyond simple here), but one with complicated details. For once, we will not have a matching exercise associated (as it is fairly general).
I will re-iterate on hypothesis tests, and the exercise of the day will focus exactly on different tests for your own random (?) data.

Reading:

Barlow, chapter 7.2
Lecture(s):

Confidence Intervals And Limits
Computer Exercise(s):

Random Digits Test: RandomDigitsTest_original.ipynb,

data_RandomDigits2019_A.txt,

data_RandomDigits2019_B.txt,

data_RandomDigits2019_C.txt,

data_RandomDigits2019_D.txt,

data_RandomDigits2019_E.txt,

data_RandomDigits2019_F.txt,

data_RandomDigits2019_G.txt, and

data_RandomDigits2019_H.txt
For a large scale test, try one million digits of pi: pi1000000.txt
In order to see, if you can test individuals ability to produce randon numbers, consider this data file (from 2017 - just to keep people anonymous): PersonsDigitsForTest2017.txt

Friday:
In the lecture, we will mainly focus on discussion of the TableMeasurement (in Aud. A), which covers both the philosophy of data handling and analysis, and actually also the construction of fits.
In the exercises, we'll try a simple example of doing integration in many dimensions using simple simulation. First, it is the estimate of pi, followed by the rational numbers in front of (hyper) volumes of balls in many dimensions!
In addition, there is a small exercise on how to fit discontinuities in data, in this case the length of the NBI HEP Christmas vacation break based on data from the coffee machine!

Reading:

No reading - logic and reason suffices (along with math and Python!).
Lecture(s):

Table Measurement Solution/Discussion

Testing random numbers
Computer Exercise(s):

Estimating pi and hypersphere size from simulation: PiEstimate_original.ipynb

NBI Coffee Usage problem: CoffeeUsage_original.ipynb

NBI Coffee Usage problem ("Empty code" version): CoffeeUsage_EmptyVersion_original.ipynb

Last updated: 6th of December 2019.