Applied Statistics - Week 4
Monday the 9th - Friday the 13th of December 2019
The following is a description of what we will go through during this
week of the course. The chapter references and computer exercises are
considered read, understood, and solved by the beginning of the
following class, where I'll shortly go through the exercise
solution.
General notes, links, and comments:
Louis Lyons discussing discovery levels::
1310.1284v1_LouisLyons_Why5sigma.pdf
Comparison between different tests for normality::
Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirn.pdf
Illustration of ROC curves: ROCcurves_GaussianSeparations.pdf
Comment on multiple hypothesis testing p-values::
p-value histogram
Paper of George Marsaglia on testing random numbers::
Random Number Generators
Monday:
The main theme of this week will be
Hypothesis testing, and
we will start with an exercise gently introducing the subject.
In addition to the ChiSquare test, there are several other tests, some
simple (one/two sample tests) and some more conceptually challenging
(Kolmogorov and Wald-Wolfowitz runs test).
Reading:
Barlow, chapter 8 on hypothesis testing (in particular 8.1-8.3).
Cohen, chapter 4 on hypothesis testing (perhaps omitting 4.2-4.4).
Lecture(s):
Hypothesis Testing
On p-value histograms
Coincidences
Computer Exercise(s):
Hypothesis testing: HypothesisTests_original.ipynb
Hypothesis testing ("Empty code" version): HypothesisTests_EmptyVersion_original.ipynb
Producing a ROC curve: MakeROCfigure.ipynb
Tuesday:
Today's lecture will be on Confidence Intervals, which in
principle is a simple subject (and we will not go beyond simple here),
but one with complicated details. For once, we will not have a
matching exercise associated (as it is fairly general).
I will re-iterate on hypothesis tests, and the exercise of the day will
focus exactly on different tests for your own random (?) data.
Reading:
Barlow, chapter 7.2
Lecture(s):
Confidence Intervals And Limits
Computer Exercise(s):
Random Digits Test:
RandomDigitsTest_original.ipynb,
data_RandomDigits2019_A.txt,
data_RandomDigits2019_B.txt,
data_RandomDigits2019_C.txt,
data_RandomDigits2019_D.txt,
data_RandomDigits2019_E.txt,
data_RandomDigits2019_F.txt,
data_RandomDigits2019_G.txt, and
data_RandomDigits2019_H.txt
For a large scale test, try one million digits of pi: pi1000000.txt
In order to see, if you can test individuals ability to produce
randon numbers, consider this data file (from 2017 - just to keep people anonymous):
PersonsDigitsForTest2017.txt
Friday:
In the lecture, we will mainly focus on discussion of the
TableMeasurement (in Aud. A), which covers both the philosophy of data
handling and analysis, and actually also the construction of fits.
In the exercises, we'll try a simple example of doing integration in
many dimensions using simple simulation. First, it is the estimate of
pi, followed by the rational numbers in front of (hyper) volumes of
balls in many dimensions!
In addition, there is a small exercise on how to fit discontinuities in data,
in this case the length of the NBI HEP Christmas vacation break based on data
from the coffee machine!
Reading:
No reading - logic and reason suffices (along with math and Python!).
Lecture(s):
Table Measurement Solution/Discussion
Testing random numbers
Computer Exercise(s):
Estimating pi and hypersphere size from simulation:
PiEstimate_original.ipynb
NBI Coffee Usage problem:
CoffeeUsage_original.ipynb
NBI Coffee Usage problem ("Empty code" version):
CoffeeUsage_EmptyVersion_original.ipynb
Last updated: 6th of December 2019.