Applied Statistics - From data to results (Winter 2015-16)

"The art of drawing conclusions from experiments and observations consists in evaluating probabilities and in estimating whether they are sufficiently great or numerous enough to constitute proofs. This kind of calculation is more complicated and more difficult than it is commonly thought to be." [Antoine Lavoisier, 1743-1794]
Troels C. Petersen Mathias L. Heltberg Niccolo Maffezzoli Asbjørn Arvad Jørgensen Geert-Jan Besjes
Lecturer - Associate Professor Assistant teacher - Ph.D. student Assistant teacher - Ph.D. student Contributor - Master student Contributor - PostDoc
NBI - High Energy Physics NBI - Biocomplexity NBI - Ice and Climate KU - Physics NBI - High Energy Physics
Mac user Mac user and expert Linux user and expert Windows user and expert Mac user and expert
35 52 54 42 / 26 28 37 39 26 19 18 89 23 86 26 37 26 48 22 29 42 56 99 94
petersennbi.dk mathiasheltberghotmail.com maffenbi.dk arvad91gmail.com geert-jan.besjescern.ch

The final take-home exam has been posted! (Hand in by Friday 22nd of January 2016 at 12:00).
Here is the associated problem 4.1 data file and a Python script for reading it.
Here is the associated problem 4.2 data file and a Python script for reading it.
Here is the associated problem 5.1 data file and a Python script for reading it.

When, where, what, prerequisites, books, curriculum and evaluation:
When: Monday 9-12, Tuesday 13-17, and Friday 9-12 (Week Schedule Group B).
Where: Lectures and Exercises: Auditorium M (Building M) at NBI.
Period: Blok 2 (16th of November 2015 - 22nd of January 2016), 8 weeks total.
Format: Shorter lectures followed by computer exercises and discussion.
Content: Graduate statistics course giving an advanced introduction to statistics and data analysis.
Level: Intended for students at 3rd-5th year of studies and new Ph.D. students.
Prerequisites: Simple mathematics and some programming experience (any language, but see below).
Note on prerequisites: Programming is an essential tool and necessary for the course!!!
Text book: Roger Barlow: Statistics: A guide to the use of statistics.
Additional literature: Philip R. Bevington: Data Reduction and Error Analysis, Glen Cowan: Statistical Data Analysis.
Programs used: Simple python and the CERN software ROOT.
Pensum/Curriculum: The course curriculum can be found here.
Key words: PDFs, Uncertainties, Correlation, Chi-Square, Likelihood, Fitting, Monte Carlo and Data Analysis.
Language: English (occational Danish utterings!). All exercises, problem sets, exams, notes, etc. are in English.
Evaluation: Problem set (15%), Projects (25% total), Take-home exam (60%).
Exam: Take-home (28 hour) exam given Thursday the 21th of January 2016 at 8:15.
Censur: Internal censor evaluation (following the Danish 7-step scale)
Credits: 7.5 ECTS (i.e. 1/8 academic years work).

Before course start:
Further course information can be found here: Applied Statistics course information
Expected learning objectives of the course are discussed here: Learning objectives
A "course introduction" questionnaire can be found at: Applied Statistics 2015 Questionaire
List of things to be done by first day of course (Monday the 16th of November): Applied Statistics check list


Course outline:
Below is the preliminary course outline, subject to changes throughout the course.

Week 0: (Pre-course-start-session)
Nov 12: (10:15-12:00): Setting up computers, Installation of Python and ROOT (Aud. M).
Nov 12: (13:15-15:00): Introduction, tips and trick to Python programming (Aud. M).
  • Introduction to Python (also read more here: Dive into Python): pythonIntro.py
  • Introduction to ROOT (also read more here: ROOT examples): RootIntro.py
  • Producing nice plots: nicefig.py, which produces this figure.

    Week 1 (Introduction, general concepts)
    Nov 16: 8:15: Intro to course, photos, questionnaire and table measurements (Aud. A).
         Central limit theorem. Mean, RMS and estimators. Correlation. Significant digits.
    Nov 17: Error propagation (which is a science!) and short Python+ROOT tutorial.
    Nov 20: ChiSquare method (which plays a central role in the course!). Form Project 1 groups.

    Week 2 (ChiSquare, Systematic Errors)
    Nov 23: 8:15: Start project 1 (for Thursday the 3rd of December) doing experiments in First Lab.
    Nov 24: Probability Density Functions (PDF) especially Binomial, Poisson and Gaussian.
    Nov 27: Analysis of "Table Measurement data" and discussion of real data analysis.

    Week 3 (Likelihood, Fitting, Using Simulation):
    Nov 30: Producing random numbers and their use in simulations.
    Dec 1: Maximum likelihood and (more) fitting.
    Dec 4: Simulation exercises and summary (having handed in project 1).
         Handing out problem set and data (for Thursday the 17th of December, 22:00).
         Problem 4.1: data file and script, Problem 5.1: data file and script.

    Week 4 (Hypothesis Testing and limits):
    Dec 7: Hypothesis testing. Simple, Chi-Square, Kolmogorov and runs tests.
    Dec 8: Limits and confidence intervals.
    Dec 11: Calibration and use of control channels. Table measurement solution.

    Week 5 (Bayes Theorem and classification):
    Dec 14: Bayes theorem and separating/classifying events. Analysis of testbeam data (part I). Evaluation of project 1 results.
    Dec 15: Analysis of testbeam data (part II). Session on Problem Set.
    Dec 18: Start project 2 (for Sunday the 17th of January). (Handing in the problem set Thursday 17th by 22:00).

    Week 6 (Multivariate Analysis):
    Jan 4: Status of and work on project 2.
    Jan 5: Multi-Variate Analysis (MVA) part I. The linear Fisher discriminant.
    Jan 8: Multi-Variate Analysis (MVA) part II. Neural Networks, Decision Trees and ROOT's TMVA.

    Week 7: (Advanced analysis and project 2)
    Jan 11: Advanced fitting. Project 2 work.
    Jan 12: Time series analysis illustration (possibly!). Project 2 work.
    Jan 15: Random digits test. (Hand in project 2 by Sunday 17, 22:00).

    Week 8 (Project 2 presentations and exam):
    Jan 18: Project 2 presentations. Discussion of full analysis and Big Data (if time allows!)
    Jan 19: Project 2 presentations. Summary/repetition of course curriculum and exam 2012.
    Jan 21: Exam given (posted on course webpage 8:15 in the morning).
    Jan 22: 12:00 Exam to be handed in.

    Week 9 (Returning exam):
    Jan 29: 12-14 (Aud. M): Exam solution, grades and course feedback.
         Designing experiments (inspired by "A lady tasting tea") with beer tasting?




    Notes and links:
    In addition to the text book and other literature, some notes may be useful during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • Note on analytical fit of straight line.
  • Note on Frequentialist vs. Bayesian statistics and discoveries.
  • Note on rejecting data using Chauvenet's criteria.
  • Nature Physics article on discoveries.
  • Fisher's Exact Test on tea drinking lady.
  • Statistics resources.
  • Exam 2012 (for practice).
  • Discussion of solutions for Exam 2012.

    Links:
  • Blog on how to use crime rates for predicting taxi demand!.

    Comments about course (biased selection!):
    "This course overqualified me for a course on scientific computing at Harvard the following Summer."
    [Dennis Christensen (2009 course)]

    "I recommended this course to everyone I know." [Pernille Yde (2009 course)]

    "I don't think that you can rightly call yourself a physicist, if you have not had a course of this type."
    [Bo Frederiksen (2010 course)]

    "My second project in the course led to an article now in review for Nature magazine!"
    [Ninna Rossen (2011 course)]

    "If you really want to understand your data, you need a course like this."
    [Julius Bier Kirkegaard (2012 course)]

    "I realized that I was very well prepared by this course, when I started working at CERN as a Summer Student."
    [Mathias Heltberg (2013 course)]

    "It is now many years ago, that I followed your course, but there is hardly a day, where I don't think about it"
    [Frederik Beyer (2011 course, in October 2014)]

    "This is without a doubt the single most useful, and possibly most influential, course I have taken during my university education. Thank you."
    [Samuel Walsh (2013 course, in December 2014)]

    "Tak for et fedt kursus. Naar jeg taenker tilbage paa mine 2.5 aars fysikstudier staar Anvendt Statistik frem som noget af det sjoveste og mest spaendende."
    [Martin Hayhurst Appel (2014 course)]

    "Every single sleepless night spent on this course has enriched my way of thinking."
    [Arianna Marchionne (2015 course)]


    Last updated 27th of January 2016.