Applied Statistics - From data to results (Winter 2025-26)

"Essentially, all models are wrong, but some are useful". [George E. P. Box, British Statistician, 1919-2013]

Troels C. Petersen Mathias Heltberg Nina Nathanson Janni Nikolaides Gabriela Oliveira Preet Pati Clara Pollock
Lecturer Assistant lecturer Teaching assistant Teaching assistant Teaching assistant Teaching assistant Teaching assistant (code)
Associate Professor Assistant Professor Ph.D. student Ph.D. student Ph.D. student Ph.D. student Ph.D. student
Particle Physics Bio Complexity Nuclear Physics Particle Physics Quantum Information Nuclear Physics Astro Physics
Mac user Mac user Mac & Windows expert Mac & Linux expert Windows expert Mac & Windows expert Mac & Linux expert
Course responsible Continuity responsible Lab coord. responsible GitHub responsible ProbSet responsible Exam responsible Exam responsible
26 28 37 39 26 19 18 89 71 32 24 27 31 50 23 59 50 34 90 29 91 66 04 39 91 67 92 43
petersennbi.dk heltberg nina.nathanson johann.nikolaides maria.oliveira bhanjanpreet clara.pollock
Office: NBB-3-I-034 nbi.ku.dk nbi.ku.dk nbi.ku.dk nbi.ku.dk nbi.ku.dk nbi.ku.dk



"Without data, you're just another person with an opinion." [William Edwards Deming, US statistician 1900-1993]

Course information (What, when, where, prerequisites, books, curriculum, and evaluation):
Period: Blok 2 (17th of November 2025 - 16th of January 2026 including exam), 8 weeks total.
When: Monday 8:15-12:00, Tuesday 13:15-17:00, and Friday 8:15-12:00 (Week Schedule Group B).
Note on morning lectures: After the first week, we will start 9:15 on Mondays and Fridays (except for project work).
Where: Lectures: Aud. 4 at HCØ, see also KU Room Schedule plan.
Exercises Mondays: NBB 2.3.H.142, 2.2.I.158, 2.2.H.142, and 2.1.I.156 (backup: 1.2.B.015).
Exercises Tuesdays: HCØ A111, A112, C103, and Aud 07 (backup: Aud 04)
Exercises Fridays: NBB 2.3.H.142, 1.01.F.70, and 2.0.G.064/070.
Content: Graduate statistics course giving an advanced introduction to statistics and data analysis.
Level: Intended for physics (and science) students at 3rd-5th year of studies and new Ph.D. students.
Prerequisites: Math (calculus and linear algebra) and programming experience (preferably Python, but there are no language requirement).
Note on prerequisites: Programming is an essential tool and necessary for the course!!!
Format: Shorter lectures followed by computer exercises, discussion, and occationally experiments.
Text book: Roger Barlow: Statistics: A guide to the use of statistics.
Additional literature: Glen Cowan: Statistical Data Analysis, which is a shorter, modern introduction to statistics and data analysis.
Software used: Python (v3.11+) and a few packages on top in Jupyter Notebook (see Nature article).
Exercise/code repository: All code used for the exercises of the course will be at the AppliedStatisticsNBI GitHub.
Pensum/Curriculum: The course curriculum covers chapters 1-8 + 10 in Barlow with many exceptions, detailed in the link.
Expected learning: What I expect you to learn is discussed here: Learning objectives
Evaluation: Problem set (20%), Project (20%), and take-home exam (60%).
Exam: Take-home (36 hours!) exam given Thursday the 15th of January 2026 at 8:00 and end on Friday the 16th of January at 20:00.

"The art of drawing conclusions from experiments and observations consists in evaluating probabilities and in estimating whether they are sufficiently great or numerous enough to constitute proofs. This kind of calculation is more complicated and more difficult than it is commonly thought to be."
[Antoine Lavoisier, French chemist 1743-1794]




Before course start:
For an overview of the course curriculum, please see the overview video (1.1 GB, 12 min.) and overview PDF.
Further course information can be found here: Applied Statistics course information.
The "course introduction" questionnaire to be filled out at: Applied Statistics 2025 Questionaire.
List of things to be done by first day of course (Monday the 17th of November): Applied Statistics check list.

Python specific precourse considerations:
The source of all code for this course is the NBI Applied Statistics github repository.
For a quick introduction to the basic git commands, please see the simple and/or advanced git cheat sheet.
Also, Install Python as described in README.md in version 3.11+, and put a few packages on top.
User Guides for the Minuit2 minimisation package with great guides on fitting and speed optimisation: iminuit2.

Problem set: (due Saturday the 3rd of January 2025 at 22:00, but please start well before)
The problem set and the associated data/code can be found on GitHub and below:
Here is the associated data_LargestPopulation.csv data file for problem 2.3 and a Python script for reading it.
Here is the associated data_SignalDetection.csv data file for problem 5.1 and a Python script for reading it.
Finally, you can find a few words of advice in the ProblemSet Advice & Check List.



Course outline:
Below is the preliminary course outline (subject to changes and updates throughout the course).

Week 0 (Help with access, setup, programming, concepts, and preparation):
Nov 14: 11:00-12:00: Help with access, setting up Python, programming, course details, and measuring table! (in Auditorium A)

Week 1 (Introduction, General Concepts, ChiSquare Method):
Nov 17: 8:15-10:00: Introduction to course, outline of subjects, and overview of curriculum (in Aud. 4 at HCØ).
     Mean and Standard Deviation. Correlations. Significant digits. Central limit theorem.
Nov 18: Error propagation (which is a science!). Estimate g measurement uncertainties.
Nov 21: ChiSquare method, evaluation, and test. Start formation of Project groups.

Week 2 (PDFs, Likelihood, Systematic Errors):
Nov 24: 9:15: Probability Density Functions (PDF) especially Binomial, Poisson and Gaussian (Note: Morning lectures now start at 9:15).
Nov 25: Principle of maximum likelihood and fitting (which is an art!).
Nov 28: 8:15 - Group A: Project (for Saturday the 13th of December) doing experiments in First Lab.
            9:15 - Group B: Systematic Uncertainties and analysis of "Table Measurement data" (Mathias) in Aud. A (+B).

Project Groups (version Tuesday 25th at 13:15): ProjectGroups_V3.0.pdf.

Week 3 (Using Simulation and More Fitting):
Dec 1: 8:15 - Group B: Project (for Saturday the 13th of December) doing experiments in First Lab.
            9:15 - Group A: Systematic Uncertainties and analysis of "Table Measurement data" (Mathias) in Aud. A (+B).
Dec 2: Producing random numbers and their use in simulations.
Dec 5: Fitting strategies.
     Submission of result values for table measurements.

Week 4 (Hypothesis Testing, confidence intervals/limits, and Bayesian statistics):
Dec 8: Hypothesis testing. Simple, Chi-Square, Kolmogorov, and runs tests.
Dec 9: Confidence intervals and limits. Testing your own random (?) numbers.
Dec 12: Bayes theorem and Baysian statistics (Mathias).
     Projects should be submitted on Absalon by Saturday the 13th of December at 22:00!
     Submission of result values for project.

Week 5 (Calibration, Multivariate Analysis, and real data classification/analysis):
Dec 15: Table Measurement solution discussion. Calibration and use of control channels.
Dec 16: Multi-Variate Analysis (MVA). The linear (Fisher) discriminant (compared to PCA).
Dec 19: Discussion of Problem Set. Analysis of real and complex data on separating/classifying events. Analysis of testbeam data.

Week 5.5 (No teaching):
Dec 22: Due to expected low attendance, this day will be dedicated to self study and own work on Problem Set.

Week 6 (Introduction to Machine Learning and real data classification/analysis):
Jan 2: Machine Learning (ML). Neural Networks, Decision Trees and other MLs (not curriculum). Exercise: ML and/or Problem Set!
     Problem set should be submitted by Saturday the 3rd of January at 22:00!

Week 7 (Advanced fitting, Problem Set deliberation, and time series):
Jan 5: Stratification - beating sqrt(N). Discussion of selected parts of course curriculum. Exercise on simulation.
Jan 6: Advanced fitting with both functions, models, and in 2D. Discussion of Analysis of testbeam data.
Jan 9: Time series analysis (Mathias).

Week 8 (Fitting and exam):
Jan 12: Discussion of Problem Set solution. Deliberation on previous exam. Discussion of fitting philosophy.
Jan 13: Simpson's Paradox. Exam questions. Catch up on exercises.
Jan 15: Exam given (posted on course webpage 8:00 in the morning).
Jan 16: 20:00 Exam to be handed in (on www.eksamen.ku.dk).


"Statistical thinking will one day be as necessary a qualification for efficient citizenship as the ability to read and write." [H.G. Wells]




Notes and links:
In addition to the text book and other literature, some notes may be useful during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • Note on analytical fit of straight line.
  • Note on Frequentialist vs. Bayesian statistics and discoveries.
  • Nature Physics article on discoveries.
  • Fisher's Exact Test on tea drinking lady.
  • Power Comparisons between tests of normality (spoiler alert: Shapiro-Wilk wins!)

    Course comments/praise (very biased selection!):
    "Kaere Troels. Jeg skriver til dig fordi jeg tog dit kursus, 'Anvendt statistik' tilbage i 2012. Jeg var (er) meget begejstret for det kursus og har brugt den viden utallige gange siden."
    [Rune Gjermundbo, Director of Business Operations, in 2023]

    "Hej Troels. Jeg taenkte paa, om ikke du ville sende mig linket til din AppStat 2022 hjemmeside? Jeg har tilsyneladende forlagt det, og der er simpelthen saa meget guld derinde. :D"
    [Sarah Andersson, Operations Engineer at Netcompany, in 2024]

    Comments from previous years

    Last updated 6th of November 2025.