Applied Statistics - Week 4
Monday the 7th - Friday the 11th of December 2020
The following is a description of what we will go through during this
week of the course. The chapter references and computer exercises are
considered read, understood, and solved by the beginning of the
following class, where I'll shortly go through the exercise
solution.
General notes, links, and comments:
Louis Lyons discussing discovery levels::
1310.1284v1_LouisLyons_Why5sigma.pdf
Comparison between different tests for normality::
Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirn.pdf
Illustration of ROC curves: ROCcurves_GaussianSeparations.pdf
Animation of ROC curves: basic_animation.mp4
Comment on multiple hypothesis testing p-values::
p-value histogram
Paper of George Marsaglia on testing random numbers::
Random Number Generators
"Extraordinary claims require extraordinary evidence"
[P.S. Laplace (1814), Marcello Truzzi (1978), Carl Sagan (1980)]
This statement refers to the fact, that the business of hypothesis testing is to assign a probability of one hypothesis compared to an alternative. Whether or not the value of this probabilty is "enough" to make any claims, is up to circumstances and the individual person, as statistics does not provide any exact decision boundary... only guidelines (See Louis Lyons above).
“I believe in evidence. I believe in observation, measurement, and reasoning, confirmed by independent observers. I'll believe anything, no matter how wild and ridiculous, if there is evidence for it. The wilder and more ridiculous something is, however, the firmer and more solid the evidence will have to be.”
[Isaac Asimov, The Roving Mind]
Monday:
The main theme of this week will be
Hypothesis testing, and
we will start with an exercise gently introducing the subject.
In addition to the ChiSquare test, there are several other tests, some
simple (one/two sample tests) and some more conceptually challenging
(Kolmogorov test, Wald-Wolfowitz runs test, and Anderson Darling's test).
Reading:
Barlow, chapter 8 on hypothesis testing (in particular 8.1-8.3).
Cohen, chapter 4 on hypothesis testing (perhaps omitting 4.2-4.4).
Lecture(s):
Coincidences
Hypothesis Testing
On p-value histograms
Zoom: Link to lecture.
              Link to exercises.
Recording of Lecture video,
Lecture audio, and
Lecture chat.
Computer Exercise(s):
Hypothesis testing: HypothesisTests_original.ipynb
Producing a ROC curve: MakeROCfigure.ipynb
Illustration/Animation of ROC curve (requires additional packages): MakeROCfigure_animation.ipynb
Tuesday:
In the lecture, we will mainly focus on discussion of the
TableMeasurement (in Aud. A), which covers both the philosophy of data
handling and analysis, and actually also the construction of fits.
In the main exercise we will re-iterate on hypothesis tests, and focus
exactly on different tests for your own random (?) data.
Reading:
Barlow, chapter 7.2
Lecture(s):
Testing random numbers
Table Measurement Solution/Discussion
Zoom: Link to lecture.
              Link to exercises.
Recording of Lecture video,
Lecture audio, and
Lecture chat.
Computer Exercise(s):
Random Digits Test: RandomDigitsTest_original.ipynb,
data_RandomDigits2020_A.txt,
data_RandomDigits2020_B.txt,
data_RandomDigits2020_C.txt,
data_RandomDigits2020_D.txt,
data_RandomDigits2020_E.txt,
data_RandomDigits2020_F.txt, and
data_RandomDigits2020_G.txt
For a large scale test, try one million digits of pi: pi1000000.txt
In order to see, if you can test individuals ability to produce
randon numbers, consider this data file (from 2017 - just to keep people anonymous):
PersonsDigitsForTest2017.txt
Friday:
Today's lecture will be on Confidence Intervals and Limits, which in
principle is a simple subject (and we will not go beyond simple here),
but one with complicated details.
In the exercise, we will look at the "art" of finding/fitting possible
peaks on a background. This includes both fitting, hypothesis testing,
and setting confidence intervals and limits.
I'll also briefly discuss Simpson's Paradox with an example (from Berkeley),
as this is a classic "paradox" in statistics and a good reminder to
analyse data thoroughly before passing judgement on cases!
Reading:
No reading - logic and reason suffices (along with math and Python!).
Lecture(s):
Confidence Intervals And Limits
Simpson's Paradox
Zoom: Link to lecture.
              Link to exercises.
Recording of Lecture video,
Lecture audio, and
Lecture chat.
Computer Exercise(s):
Fitting peaks: FittingPeaks_original.ipynb
Simpson's paradox: Simpsons_Paradox.ipynb (simple and mostly for illustration)
Last updated: 5th of December 2020.