Applied Statistics - Week 4
Monday the 9th - Friday the 13th of December 2024
The following is a description of what we will go through during this
week of the course. The chapter references and computer exercises are
considered read, understood, and solved by the beginning of the
following class, where I'll shortly go through the exercise
solution.
General notes, links, and comments:
Louis Lyons discussing discovery levels::
1310.1284v1_LouisLyons_Why5sigma.pdf
Comparison between different tests for normality::
Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirn.pdf
Illustration of ROC curves: ROCcurves_GaussianSeparations.pdf
Animation of ROC curves: basic_animation.mp4
Paper of George Marsaglia on testing random numbers::
Random Number Generators
"Extraordinary claims require extraordinary evidence"
[P.S. Laplace (1814), Marcello Truzzi (1978), Carl Sagan (1980)]
This statement refers to the fact, that the business of hypothesis testing is to assign a probability of one hypothesis compared to an alternative. Whether or not the value of this probabilty is "enough" to make any claims, is up to circumstances and the individual case, as statistics does not provide any pre-defined decision boundary... only guidelines (See Louis Lyons above).
“I believe in evidence. I believe in observation, measurement, and reasoning, confirmed by independent observers. I'll believe anything, no matter how wild and ridiculous, if there is evidence for it. The wilder and more ridiculous something is, however, the firmer and more solid the evidence will have to be.”
[Isaac Asimov, The Roving Mind]
Monday:
The main theme of this week will be
Hypothesis testing.
In addition to the ChiSquare test, there are several other tests, some
simple (one/two sample tests) and some more conceptually challenging
(Kolmogorov test, Wald-Wolfowitz runs test, and Anderson Darling's test).
We will start with an exercise gently introducing the subject.
Reading:
Barlow, chapter 8 on hypothesis testing (in particular 8.1-8.3).
Cohen, chapter 4 on hypothesis testing (perhaps omitting 4.2-4.4).
Lecture(s):
Coincidences
Hypothesis Testing
Computer Exercise(s):
Hypothesis testing: HypothesisTests_original.ipynb
(empty version)
Producing a ROC curve: MakeROCfigure.ipynb (for illustration)
Illustration/Animation of ROC curve (requires additional Python packages): MakeROCfigure_animation.ipynb
Tuesday:
First, we will shortly consider confidence intervals and limits.
Then the lecture will discuss how to concretely implement a hypothesis test.
The example data considered are the "random" numbers (most of) you gave
in the course questionnaire. Are these really random, and more importantly:
How would you test this?.
Following this lecture and discussion, the exercise we will exactly to implement
different tests for your own random (?) data, and determining which sample is
human.
Reading:
Same as for Monday (hypothesis testing).
Test inspiration from the Diehard Tests.
Lecture(s):
Confidence Intervals And Limits
Testing random numbers
Computer Exercise(s):
Random Digits Test: RandomDigitsTest_original.ipynb,
data_RandomDigits2024_A.txt,
data_RandomDigits2024_B.txt,
data_RandomDigits2024_C.txt, and
data_RandomDigits2024_D.txt,
For a large scale test, try one million digits of pi: pi1000000.txt
In order to see, if you can test individuals ability to produce randon numbers,
consider this data file (from 2017 - just to keep people anonymous):
PersonsDigitsForTest2017.txt
Friday:
A central theme in probability and statistics is Bayes' Theorem, which concerns itself
with prior probabilities, i.e. incorporating existing knowledge in evaluating outcomes.
Many of you know this theorem already, but with this exercise we will try to bring a general
perspective on data analysis along with it.
In addition, we will have a look at Markov Chains and how they can be used in relation to
Bayes' Theorem. Mathias will be giving the lecture and have designed the exercises.
Reading:
Non - just listen to Mathias, ask questions, and wonder about these concepts.
NOTE: You should by now have read curriculum (roughly most of Barlow chapters 1-8).
Lecture(s):
Bayes' theorem and Markov Chains
Computer Exercise(s):
EhrenfestBallExperiment_original.ipynb.
DeterminingGenotypes_original.ipynb.
Note: Before the end of the week (i.e. Saturday the 14th of December at 22:00) you should have handed in your project (PDF submitted on Absalon - just one per group),
along with the result values for the Project and Table Measurements.
Last updated: 5th of December 2024.