Applied Statistics - From data to results (Winter 2023-24)

"Essentially, all models are wrong, but some are useful". [George E. P. Box, British Statistician, 1919-2013]

Troels C. Petersen Mathias Heltberg Azzurra D'Alessandro Arnau T. Morancho Thomas Spieksma Gaia Fabj Malthe S. Nordentoft
Lecturer Assistant lecturer Teaching assistant Teaching assistant Teaching assistant Teaching assistant Contributor
Associate Professor Senior PostDoc Ph.D. student Ph.D. student Ph.D. student Ph.D. student Ph.D. student
High Energy Physics Bio Complexity Astrophysics High Energy Physics Gravitational Physics Gravitational Physics Bio Complexity
Mac user Mac user Linux & Windows expert Windows expert Mac expert Mac expert Mac & Linux expert
Course responsible Continuity responsible Absalon responsible Exam responsible ProbSet responsible Lab coord. responsible GitHub responsible
26 28 37 39 26 19 18 89 42 45 39 54 +34 629 350 151 +31 610 844 213 71 85 74 82 28 90 19 02
petersennbi.dk heltberg azzurra.dalessandro arnau.morancho thomas.spieksma gaia.fabj malthe.nielsen
Office: NBB-3-I-034 nbi.ku.dk nbi.ku.dk nbi.ku.dk nbi.ku.dk nbi.ku.dk nbi.ku.dk

"Without data, you're just another person with an opinion." [William Edwards Deming, US statistician 1900-1993]


Course information (What, when, where, prerequisites, books, curriculum, and evaluation):
Content: Graduate statistics course giving an advanced introduction to statistics and data analysis.
Level: Intended for physics (and science) students at 3rd-5th year of studies and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python, but there are no language requirement).
Note on prerequisites: Programming is an essential tool and necessary for the course!!!
When: Monday 8:15-12:00, Tuesday 13:15-17:00, and Friday 8:15-12:00 (Week Schedule Group B).
Note on morning lectures: After the first three weeks, we will start 9:15 on Mondays and Fridays.
Where: Lectures: Aud. 2 at HCO, see also KU Room Schedule plan.
Exercises: Rooms 4-0-05/10/13 (Mon+Fri) and 4-0-24/32 (Tues) at the BioCenter.
Period: Blok 2 (20th of November 2023 - 19th of January 2024 including exam), 8 weeks total.
Format: Shorter lectures followed by computer exercises, discussion, and occationally experiments.
Text book: Roger Barlow: Statistics: A guide to the use of statistics.
Additional literature: Philip R. Bevington: Data Reduction and Error Analysis, which is a great down-to-earth introduction to statistics.
Glen Cowan: Statistical Data Analysis, which is a shorter, modern introduction to statistics and data analysis.
Programs used: Python (v3.8+) and a few packages on top in Jupyter Notebook (see Nature article).
Jupyter Notebooks has pros and cons, both of which are important to know about, e.g. Why I don't like notebooks!
Exercise/code repository: All code used for the exercises of the course will be at the AppliedStatisticsNBI GitHub.
Pensum/Curriculum: The course curriculum covers chapters 1-8 + 10 in Barlow with many exceptions, detailed in the link.
Key words: PDFs, Uncertainties, Correlation, Chi-Square, Likelihood, Fitting, Monte Carlo methods, and Data Analysis.
Expected learning: What I expect you to learn is discussed here: Learning objectives
Language: English (occational Danish utterings!). All exercises, problem sets, exams, notes, etc. are in English.
Evaluation: Problem set (20%), Project (20%), and take-home exam (60%).
Exam: Take-home (36 hours!) exam given Thursday the 18th of January 2024 at 8:00 and end on Friday the 19th of January at 20:00.
Credits/Censur: 7.5 ECTS with internal censor evaluation (following the Danish 7-step scale)

"Statistical thinking will one day be as necessary a qualification for efficient citizenship as the ability to read and write." [H.G. Wells]



Course outline:
Below is the preliminary course outline (subject to changes and updates throughout the course).

Week 0 (Help with access, setup, programming, concepts, and preparation):
Nov 16: 9:00-10:00: Help with access, setting up Python, programming, course details, and measuring table! (Aud. A)

Week 1 (Introduction, General Concepts, ChiSquare Method):
Nov 20: 8:15-10:00: Introduction to course and overview of curriculum.
     Mean and Standard Deviation. Correlations. Significant digits. Central limit theorem.
Nov 21: Error propagation (which is a science!). Estimate g measurement uncertainties.
Nov 24: ChiSquare method, evaluation, and test. Formation of Project groups.

Week 2 (PDFs, Likelihood, Systematic Errors):
Nov 27: Probability Density Functions (PDF) especially Binomial, Poisson and Gaussian.
Nov 28: Principle of maximum likelihood and fitting (which is an art!).
Dec 1: 8:15 - Group A: Project (for Thursday the 14th of December) doing experiments in First Lab.
            9:15 - Group B: Systematic Uncertainties and analysis of "Table Measurement data" Discussion of real data analysis in Aud. A (+B).

Week 3 (Using Simulation and More Fitting):
Dec 4: 8:15 - Group B: Project (for Thursday the 14th of December) doing experiments in First Lab.
            9:15 - Group A: Systematic Uncertainties and analysis of "Table Measurement data". Discussion of real data analysis in Aud. A (+B).
Dec 5: Producing random numbers and their use in simulations.
Dec 8: 9:15 Fitting strategies (Note: No more morning lectures starting at 8:15).

Week 4 (Hypothesis Testing and limits):
Dec 11: Table Measurement solution discussion and Simpson's Paradox. Exercises: Work on analysis of project data.
Dec 12: Hypothesis testing. Simple, Chi-Square, Kolmogorov, and runs tests.
     Project should been submitted by Thursday the 14th of December at 22:00!
Dec 15: More hypothesis testing, limits, and confidence intervals. Testing your random (?) numbers.

Week 5 (Multivariate Analysis, Calibration, and Bayesian statistics):
Dec 18: Multi-Variate Analysis (MVA). The linear (Fisher) discriminant (compared to PCA).
Dec 19: Calibration and use of control channels.
Dec 22: Bayes theorem and Baysian statistics (Mathias).

     For exam training, here is Exam2016.pdf, to be discussed shortly on Tuesday the 17th of January.
     Here is the associated problem 4.1 data file.
     Here is the associated problem 5.1 data file.
     Here is the associated problem 5.2 data file.
     Here is an approximate solution manual.

Week 6 (Introduction to Machine Learning and real data classification/analysis):
Jan 1: No teaching.
Jan 2: Machine Learning (ML). Neural Networks, Decision Trees and other MLs. Exercise: ML and/or Problem Set!
     Problem set should be submitted by Thurssay the 4th of January at 22:00!
Jan 5: Analysis of real and complex data on separating/classifying events. Analysis of testbeam data.

Week 7 (Curriculum summary, advanced fitting, and time series):
Jan 8: Advanced fitting with both functions, models, and in 2D.
Jan 9: Summary and discussion of course curriculum.
Jan 12: Time series analysis (Mathias).

Week 8 (Problem Set deliberation, Fitting, Exam questions, and exam):
Jan 15: Discussion of Problem Set solution!
Jan 16: Exam questions and answers. Deliberation on previous (2016) exam. Re-visit exercises.
Jan 18: Exam given (posted on course webpage 8:00 in the morning).
Jan 19: 20:00 Exam to be handed in on www.eksamen.ku.dk.

Week 9 (Returning exam):
Jan 26: 15:15-16:30ish+: Exam solution, grades and course feedback.
     Designing experiments (inspired by "A lady tasting tea") with beer tasting? Or just beer...

"The art of drawing conclusions from experiments and observations consists in evaluating probabilities and in estimating whether they are sufficiently great or numerous enough to constitute proofs. This kind of calculation is more complicated and more difficult than it is commonly thought to be."
[Antoine Lavoisier, French chemist 1743-1794]




Notes and links:
In addition to the text book and other literature, some notes may be useful during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • Note on analytical fit of straight line.
  • Note on Frequentialist vs. Bayesian statistics and discoveries.
  • Note on rejecting data using Chauvenet's criteria.
  • Nature Physics article on discoveries.
  • Fisher's Exact Test on tea drinking lady.
  • Statistics resources.
  • Online course introducing Machine Learning..
  • Power Comparisons between tests of normality (spoiler alert: Shapiro-Wilk wins!)

    Course comments/praise (very biased selection!):
    "Kaere Troels. Jeg skriver til dig fordi jeg tog dit kursus, 'Anvendt statistik' tilbage i 2012. Jeg var (er) meget begejstret for det kursus og har brugt den viden utallige gange siden."
    [Rune Gjermundbo, Director of Business Operations, 2022]

    Comments from previous years

    Last updated 21st of August 2023.