Applied Statistics - From data to results (Winter 2021-22)

"Far too many scientists have only a shaky grasp of the statistical techniques they are using. They employ them as an amateur chef employs a cook book, believing the recipes will work without understanding why. A more cordon bleu attitude to the maths involved might lead to fewer statistical souffle's failing to rise. [Sloppy stats shame science, The Economist (June 5th 2004)]

Troels C. Petersen Mathias Heltberg Clara G. Arteaga Kate M. L. Gould Vadim Rusakov Irene L. Kruse
Lecturer - Associate Professor Assistant lecturer - PostDoc Teaching assistant - Ph.D. Teaching assistant - Ph.D. Teaching assistant - Ph.D. Teaching assistant - Ph.D.
NBI - High Energy Physics NBI - Bio Complexity NBI - Astrophysics NBI - Astrophysics NBI - Astrophysics NBI - Climate physics
Mac user Mac expert Mac expert Mac & Linux expert Windows & Mac expert Mac (& Windows) expert
35 32 54 42 / 26 28 37 39 26 19 18 89 Lab coord. responsible Slack responsible GitHub responsible Zoom responsible
petersennbi.dk heltbergnbi.ku.dk clara.arteaganbi.ku.dk katriona.gouldnbi.ku.dk vadim.rusakovnbi.ku.dk irene.krusenbi.ku.dk

"Without data, you're just another person with an opinion." [William Edwards Deming, US statistician 1900-1993]


Course information:
What, when, where, prerequisites, books, curriculum and evaluation:
Content: Graduate statistics course giving an advanced introduction to statistics and data analysis.
Level: Intended for science students at 3rd-5th year of studies and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python, but there are no language requirement).
Note on prerequisites: Programming is an essential tool and necessary for the course!!!
When: Monday 8:15-12:00, Tuesday 13:15-17:00, and Friday 8:15-12:00 (Week Schedule Group B).
Where: Lectures: Lille UP1 at DIKU (Mon), Aud-A2-82.01 in Frederiksberg (Tues), and Aud-A2-70.04 (first two weeks) / Store UP1 (Fri).
Exercises: BioCenter++ (Mon), Frederiksberg (Tues), and DIKU++ (Fri), see KU Room Schedule plan.
Period: Blok 2 (22nd of November 2021 - 21st of January 2022 including exam), 8 weeks total.
Format: Shorter lectures followed by computer exercises, discussion, and occationally experiments.
Text book: Roger Barlow: Statistics: A guide to the use of statistics.
Additional literature: Philip R. Bevington: Data Reduction and Error Analysis, which is a great down-to-earth introduction to statistics.
Glen Cowan: Statistical Data Analysis, which is a shorter, modern introduction to statistics and data analysis.
Programs used: Simple Python (v3.9+) and a few packages on top in Jupyter Notebook (see Nature article).
Jupyter Notebooks has pros and cons, both of which are important to know about, e.g. Why I don't like notebooks!
Exercise/code repository: All code used for the exercises of the course will be at the AppliedStatisticsNBI GitHub.
Slack channel: The course Slack channel is: NbiAppliedStatistics2021.slack.com.
Pensum/Curriculum: The course curriculum covers chapters 1-8 + 10 with many exceptions, detailed in the link.
Key words: PDFs, Uncertainties, Correlation, Chi-Square, Likelihood, Fitting, Monte Carlo and Data Analysis.
Expected learning: What I expect you to learn is discussed here: Learning objectives
Language: English (occational Danish utterings!). All exercises, problem sets, exams, notes, etc. are in English.
Evaluation: Problem set (20%), Project (20%), and take-home exam (60%).
Exam: Take-home (36 hours!) exam given Thursday the 20th of January 2022 at 8:00.
The exam will start on Thursday the 20th at 8:00 and end on Friday the 21st of January at 20:00 (36 hours in total).
Credits/Censur: 7.5 ECTS with internal censor evaluation (following the Danish 7-step scale)

"Essentially, all models are wrong, but some are useful". [George E. P. Box, British Statistician, 1919-2013]


The Applied Statistics course and its clever students were mentioned by Mathias Heltberg in DR Deadline on the 17th of February 2022, in a discussion about the effect of lockdowns on the spread of covid-19. Mathias of course argued very well.


Before course start:
Further course information can be found here: Applied Statistics course information.
The "course introduction" questionnaire to be filled out at: Applied Statistics 2021 Questionaire.
List of things to be done by first day of course (Monday the 22nd of November): Applied Statistics check list.
For an overview of the course curriculum, please see the overview video (560 MB, 18 min.) (audio) and overview PDF.


Course outline:
Below is the preliminary course outline (subject to changes and updates throughout the course).

Week 1 (Introduction, General Concepts, ChiSquare Method):
Nov 22: 8:15-10:00: Introduction to course and overview of curriculum.
     Mean and Standard Deviation. Correlations. Significant digits. Central limit theorem. (12-13 Measuring in Aud. A!)
Nov 23: Error propagation (which is a science!). Estimate g measurement uncertainties.
Nov 26: ChiSquare method, evaluation, and test. Formation of Project groups.

Week 2 (PDFs, Likelihood, Systematic Errors):
Nov 29: Probability Density Functions (PDF) especially Binomial, Poisson and Gaussian.
Nov 30: Principle of maximum likelihood and fitting (which is an art!).
Dec 3: 8:15 - Group A: Project (for Wednesday the 15th of December) doing experiments in First Lab.
            9:15 - Group B: Systematic Uncertainties and analysis of "Table Measurement data" Discussion of real data analysis (usual rooms).

Week 3 (Using Simulation and More Fitting):
Dec 6: 8:15 - Group B: Project (for Wednesday the 15th of December) doing experiments in First Lab.
            9:15 - Group A: Systematic Uncertainties and analysis of "Table Measurement data". Discussion of real data analysis (usual rooms).
Dec 7: Producing random numbers and their use in simulations.
Dec 10: Summary of curriculum so far. Fitting tips and strategies.

Week 4 (Hypothesis Testing and limits):
Dec 13: Hypothesis testing. Simple, Chi-Square, Kolmogorov, and runs tests.
Dec 14: Table Measurement solution discussion. Testing your random (?) numbers.
     Project should been submitted by Wednesday the 15th of December at 22:00!
Dec 17: Limits and confidence intervals and Simpson's paradox.

Week 5 (Multivariate Analysis and introduction to Machine Learning):
Dec 20: Bayes theorem. Multi-Variate Analysis (MVA). The linear Fisher discriminant.
Dec 21: Machine Learning (ML). Neural Networks, Decision Trees and other MLs.

     For exam training, here is Exam2016.pdf, to be discussed shortly on Monday the 18th of January.
     Here is the associated problem 4.1 data file.
     Here is the associated problem 5.1 data file.
     Here is the associated problem 5.2 data file.
     Here is the solution manual for Exam2016.

Week 6 (Real data classification/analysis and Bayesian statistics):
Jan 3: Analysis of real and complex data on separating/classifying events. Analysis of testbeam data (part I).
     Problem set should be submitted by Monday the 3rd of January at 22:00!
Jan 4: Shorter lecture on Machine Learning. Analysis of testbeam data (part II).
Jan 7: TBD (Mathias).

Week 7 (Advanced fitting, Calibration, and Problem Set deliberation):
Jan 10: Advanced fitting with both functions and models.
Jan 11: Calibration and use of control channels.
Jan 14: Discussion of Problem Set solution. Using simulation - Monte Carlo for determining pi.

Week 8 (Fitting and exam):
Jan 17: Discussion of selected parts of course curriculum. Exercise on fitting.
Jan 18: Deliberation on previous (2016) exam. Exam questions. Catch up on exercises.
Jan 20: Exam given (posted on course webpage 8:00 in the morning).
Jan 21: 20:00 Exam to be handed in (on www.eksamen.ku.dk).

Week 9 (Returning exam):
Jan 28: 15:30-16:30ish+: Exam solution, grades and course feedback (Link to final session).
     Designing experiments (inspired by "A lady tasting tea") with beer tasting? Or just beer...


"The art of drawing conclusions from experiments and observations consists in evaluating probabilities and in estimating whether they are sufficiently great or numerous enough to constitute proofs. This kind of calculation is more complicated and more difficult than it is commonly thought to be."
[Antoine Lavoisier, French chemist 1743-1794]




Notes and links:
In addition to the text book and other literature, some notes may be useful during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • Note on analytical fit of straight line.
  • Note on Frequentialist vs. Bayesian statistics and discoveries.
  • Note on rejecting data using Chauvenet's criteria.
  • Nature Physics article on discoveries.
  • Fisher's Exact Test on tea drinking lady.
  • Statistics resources.
  • Online course introducing Machine Learning..
  • Power Comparisons between tests of normality (spoiler alert: Shapiro-Wilk wins!)

    Course comments/praise (very biased selection!):
    "The lectures are simply a joy to witness. If only all lecturers were like these, KU would likely be number 1 in terms of having a good time learning (and the p-value of that is 0.99)."
    [Anonymous, 2020 evaluations for Ph.D. students]

    "This course should be part of every phyiscs Bachelor curriculum as it provides essential tools for scientific work."
    [Anonymous, Last line in the evaluation of 2020 course]

    Comments from previous years

    Last updated 25th of December 2021.