Applied Statistics - From data to results (Fall 2014)

"Coincidences, in general, are great stumbling blocks in the way of that class of thinkers who have been educated to know nothing of the theory of probabilities [and statistics] - that theory to which the most glorious objects of human research are indebted for the most glorious of illustration."
[Edgar Allan Poe, "The Murders in the Rue Morgue", 1841]
Troels C. Petersen Florian P. Uekermann Asbjørn Arvad Jørgensen Lars Egholm Pedersen
Lecturer - Associate Professor Assistant teacher - Ph.D. student Assistant teacher - Master student Contributor - Ph.D. student
NBI - High Energy Physics NBI - Center for Models of Life KU - Physics NBI - High Energy Physics
Mac user Linux user and expert Windows user and expert Mac user and expert
35 52 54 42 / 26 28 37 39 50 17 98 51 26 48 22 29 61 66 41 19
petersennbi.dk florian.uekermannnbi.dk arvad91gmail.com egholmnbi.dk


The final take-home exam has been posted! (Hand in by Friday 12:00).
Here is the associated problem 4.1 data file and a Python script for reading it.
Here is the associated problem 5.1 data file and a Python script for reading it.


When and where:
When: Monday 9-12, Tuesday 13-17, and Friday 9-12 (Week Schedule Group B).
Where: Lectures: Auditorium A (Building C), Exercises: Auditorium M (Building M) at NBI.
Period: Blok 1 (1st of September - 31st of October 2014), 9 weeks.
Format: Shorter lectures followed by computer exercises and discussion.

Content, books and prerequisites:
Content: Graduate statistics course giving an advanced introduction to statistics and data analysis.
Level: Intended for students at 3rd-5th year of studies and new Ph.D. students.
Prerequisites: Simple mathematics and some programming experience (any language, but see below).
Note on prerequisites: Programming is an essential tool and necessary for the course!!!
Text book: Roger Barlow: Statistics: A guide to the use of statistics.
Additional literature: Philip R. Bevington: Data Reduction and Error Analysis, Glen Cowan: Statistical Data Analysis.
Programs used: Simple python and the CERN software ROOT.
Pensum/Curriculum: The course curriculum can be found here.
Key words: PDF, Uncertainties, Correlation, Chi-Square, Likelihood, Fitting, Monte Carlo and Data Analysis.
Language: Danish (English if requested). All exercises, problem sets, exams, notes, etc. are in English.

Evaluation:
Evaluation: Problem set (25%), Projects (25% total), Take-home exam (50%).
Exam: Take-home (28 hour) exam given Thursday the 30th of October 2014 at 8:15.
Censur: Internal censor evaluation (following the Danish 7-step scale)
Credits: 7.5 ECTS (i.e. 1/8 academic years work).

Further information can be found here: Applied Statistics course information
A semi-mandatory "course introduction" questionnaire can be found at: Applied Statistics 2014 Questionaire
List of things to be done by first day of course (1st of September): Applied Statistics check list


Course outline:
Below is the preliminary course outline, subject to changes throughout the course.

Week 0: (Pre-course-start-session)
27 (10:15-12:00): Setting up computers, Installation of Python and ROOT (Aud. M).
29 (10:15-12:00): Introduction, tips and trick in Python programming (Aud. M).
  • Introduction to Python (also read more here: Dive into Python): pythonIntro.py
  • Introduction to ROOT (also read more here: ROOT examples): RootIntro.py
  • Producing nice plots: nicefig.py, which produces this figure.

    Week 1 (Introduction, general concepts)
    1 8:15: Intro to course, photos, questionnaire and table measurements (Aud. A).
         Central limit theorem. Mean, RMS and estimators. Correlation.
    2: Error propagation (which is a science!) and Python+ROOT tutorial.
    5: Start project 1 (for Thursday the 18th of September) doing experiments in First Lab.

    Week 2 (ChiSquare, Systematic Errors)
    8: ChiSquare method (and possibly (re-)doing project 1 experiments in First Lab).
    9: Probability Density Functions (PDF) especially Binomial, Poisson and Gaussian.
    12: Systematic errors and Table Measurements.

    Week 3 (Likelihood, Fitting, Using Simulation):
    15: Producing random numbers and their use in simulations.
    16: Maximum likelihood and (more) fitting.
    19: Simulation exercises and summary (having handed in project 1).
         Handing out problem set and data (for Wednesday the 1st of October).
         Slides discussing problem set and Example Solution.

    Week 4 (Hypothesis Testing):
    22: Two exercises on fitting.
    23: Hypothesis testing. Simple, Chi-Square, Kolmogorov and runs tests.
    26: Limits and confidence intervals.

    Week 5 (Bayes Theorem and classification):
    29: Table measurement solution. Bayes theorem and separating/classifying events. Analysis of testbeam data (part I).
    30: Evaluation of project 1 results. Analysis of testbeam data (part II). (Handing in the problem set Wednesday by midnight).
    3: Start project 2 (for Sunday the 26th of October).

    Week 6 (Classifying events and Project 2):
    6: Calibration and use of control channels. Blind analysis.
    7: Multi-Variate Analysis (MVA) part I. The linear Fisher discriminant.
    10: Multi-Variate Analysis (MVA) part II. Neural Networks, Decision Trees and ROOT's TMVA. [Lars teaching]

    Week 7: (efterÄrsferie/project2)
    13:
    14:
    17:

    Week 8 (Time series and project 2 work):
    20: Project 2 work.
    21: Time series analysis [Florian teaching].
    24: Work on 2nd project.

    Week 9 (Project 2 presentations, summary and exam):
    27: Project 2 presentations. (Hand in project 2 before Sunday midnight)
    28: Summary/repetition of course curriculum.
    30: Exam given (posted on course webpage in the morning 8:15).
    31: 12:00 Exam to be handed in.

    Week 10 (Returning exam):
    7: 13-15 (Aud. M): Exam solution and grades. Designing experiments (inspired by "A lady tasting tea") with beer tasting?



    Problem sets, projects, and exam set:
    During the course there will be one problem set to be solved, two projects to be carried out, and a final take-home exam to be handed in, all of which can (in due time) be found below:
  • Project 1 (Fri. 5th - Thu. 18th September).
  • Problem Set and data for problem 4.1 (Fri. 19th September - Wed. 1st of October).
  • Project 2 (Mon. 1st September - Sun. 26th October).
  • Final take-home exam (Thur. 30th October - Fri. 31st October).




    Notes and links:
    In addition to the text book and other literature, some notes may be useful during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • Note on analytical fit of straight line.
  • Note on Frequentialist vs. Bayesian statistics and discoveries.
  • Note on rejecting data using Chauvenet's criteria.
  • Nature Physics article on discoveries.
  • Fisher's Exact Test on tea drinking lady.
  • The Data Analysis Brief Book.
  • Statistics resources.

    Links:
  • Blog on how to use crime rates for predicting taxi demand!.

    Comments about course (biased selection!):
    "This course overqualified me for a course on scientific computing at Harward the following Summer."
    [Dennis Christensen (2009 course)]

    "I recommended this course to everyone I know." [Pernille Yde (2009 course)]

    "I don't think that you can rightly call yourself a physicist, if you have not had a course of this type."
    [Bo Frederiksen (2010 course)]

    "My second project in the course led to an article now in review for Nature magazine!"
    [Ninna Rossen (2011 course)]

    "If you really want to understand your data, you need a course like this."
    [Julius Bier Kirkegaard (2012 course)]

    "I realized that I was very well prepared by this course, when I started working at CERN as a Summer Student."
    [Mathias Heltberg (2013 course)]

    "It is now many years ago, that I followed your course, but there is hardly a day, where I don't think about it"
    [Frederik Beyer (2011 course, in October 2014)]

    "This is without a doubt the single most useful, and possibly most influential, course I have taken during my university education. Thank you."
    [Samuel Walsh (2013 course, in December 2014)]

    "Tak for et fedt kursus. Naar jeg taenker tilbage paa mine 2.5 aars fysikstudier staar Anvendt Statistik frem som noget af det sjoveste og mest spaendende."
    [Martin Hayhurst Appel (2014 course)]