Applied Statistics - Week 4

Monday the 11th - Friday the 15th of December 2023

The following is a description of what we will go through during this week of the course. The chapter references and computer exercises are considered read, understood, and solved by the beginning of the following class, where I'll shortly go through the exercise solution.

General notes, links, and comments:
  • Louis Lyons discussing discovery levels:: 1310.1284v1_LouisLyons_Why5sigma.pdf
  • Comparison between different tests for normality:: Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smirn.pdf
  • Illustration of ROC curves: ROCcurves_GaussianSeparations.pdf
  • Animation of ROC curves: basic_animation.mp4
  • Paper of George Marsaglia on testing random numbers:: Random Number Generators

    "Extraordinary claims require extraordinary evidence" [P.S. Laplace (1814), Marcello Truzzi (1978), Carl Sagan (1980)]

    This statement refers to the fact, that the business of hypothesis testing is to assign a probability of one hypothesis compared to an alternative. Whether or not the value of this probabilty is "enough" to make any claims, is up to circumstances and the individual case, as statistics does not provide any pre-defined decision boundary... only guidelines (See Louis Lyons above).

    I believe in evidence. I believe in observation, measurement, and reasoning, confirmed by independent observers. I'll believe anything, no matter how wild and ridiculous, if there is evidence for it. The wilder and more ridiculous something is, however, the firmer and more solid the evidence will have to be.” [Isaac Asimov, The Roving Mind]


    Monday:
    In the lecture, we will mainly focus on discussion of the TableMeasurement (in Aud. A), which covers both the philosophy of data handling and analysis, and actually also the construction of fits.
    In the exercises we will work on the project experiment data, doing our best to answer any final questions and consider possible experimental discrepancies.

    Reading:
  • Barlow, chapter 7.2
    Lecture(s):
  • Table Measurement Solution/Discussion
  • Simpson's Paradox
  • Recording of Lecture video.
    Computer Exercise(s):
  • Work on the project data analysis (or previous exercises).


    Tuesday:
    The main theme of this week will be Hypothesis testing. In addition to the ChiSquare test, there are several other tests, some simple (one/two sample tests) and some more conceptually challenging (Kolmogorov test, Wald-Wolfowitz runs test, and Anderson Darling's test). We will start with an exercise gently introducing the subject.

    Reading:
  • Barlow, chapter 8 on hypothesis testing (in particular 8.1-8.3).
  • Cohen, chapter 4 on hypothesis testing (perhaps omitting 4.2-4.4).
    Lecture(s):
  • Coincidences
  • Hypothesis Testing
  • Recording of Lecture video.
    Computer Exercise(s):
  • Hypothesis testing: HypothesisTests_original.ipynb
  • Producing a ROC curve: MakeROCfigure.ipynb (for illustration)
  • Illustration/Animation of ROC curve (requires additional Python packages): MakeROCfigure_animation.ipynb

    Friday:
    The lecture will discuss how to concretely implement a hypothesis test. The example data considered are the "random" numbers (most of) you gave in the course questionnaire. Are these really random, and more importantly: How would you test this?.
    Following this lecture and discussion, the exercise we will exactly to implement different tests for your own random (?) data, and determining which sample is human.

    Reading:
  • Same as for Tuesday (hypothesis testing).
  • Test inspiration from the Diehard Tests.
    Lecture(s):
  • Testing random numbers
  • Confidence Intervals And Limits
  • Recording of Lecture video.
    Computer Exercise(s):
  • Random Digits Test: RandomDigitsTest_original.ipynb,
  • data_RandomDigits2023_A.txt,
  • data_RandomDigits2023_B.txt,
  • data_RandomDigits2023_C.txt,
  • data_RandomDigits2023_D.txt,
  • data_RandomDigits2023_E.txt, and
  • data_RandomDigits2023_F.txt,
    For a large scale test, try one million digits of pi: pi1000000.txt
    In order to see, if you can test individuals ability to produce randon numbers, consider this data file (from 2017 - just to keep people anonymous): PersonsDigitsForTest2017.txt
    Last updated: 9th of December 2023.