Applied Statistics - From data to results (Fall 2011)

"Coincidences, in general, are great stumbling blocks in the way of that class of thinkers who have been educated to know nothing of the theory of probabilities [and statistics] - that theory to which the most glorious objects of human research are indebted for the most glorious of illustration."
[Edgar Allan Poe, "The Murders in the Rue Morgue", 1841]
The final take-home exam has been posted at 8:00am. (Hand in Friday 12:00)

General information:
Lecturer: Troels C. Petersen (NBI High Energy Physics (HEP)) (petersennbi.dk).
Additional teacher: Sascha Mehlhase (NBI High Energy Physics (HEP)) (mehlhasenbi.dk).
When: Monday 9-12, Tuesday 13-17, and Friday 9-12 (Week Group B).
Where: Monday Aud. M, Tuesday Aud. M, and Friday Aud. M (Building M at NBI).
Period: Blok 1 (5th of September - 4th of November 2011), 8 weeks.
Evaluation: Exercises (25%), Problem sets and Projects (25%), Take-home exam (50%).
Exam: Take-home (24 hour) exam given Thursday the 3rd of November 2011 at 8:15.
Censur: Internal evaluation (following the Danish 7-step scale)
Credits: 7.5 ECTS (i.e. 1/8 academic years work).
Level: Intended for students at 3rd - 5th year of studies and new Ph.D. students.
Prerequisites: Simple mathematics and basic (but some) programming (any language).
Programs used: Simple C++ and the CERN software ROOT.
Text book: Roger Barlow: Statistics: A guide to the use of statistics.
Additional litterature: Philip R. Bevington: Data Reduction and Error Analysis.
Glen Cowan: Statistical Data Analysis.
Pensum/Curriculum: The course curriculum can be found here.
Outline: Graduate statistics course giving an advanced introduction to data analysis.
Course format: Shorter lectures followed by computer exercises and discussion.
Key words: PDF, Uncertainties, Correlation, Chi-Square, Likelihood, Fitting, Monte Carlo.
Language: Danish (English if requested). All exercises, problem sets, exams, notes, etc. are in English.






Course outline:
Week 1: (Introduction, setting up, general concepts)
5: Intro to course, photos, questionnaire, quiz, and table measurements (Aud. A). Central limit theorem. Mean, RMS and estimators.
6: Correlation. Distributions.
9: Error propagation.
Week 2:
12: ChiSquare and fitting. Hand out 1st problem set (for Tuesday the 20th).
13: Random numbers and their use in MC. Hypothesis testing.
16: Systematic errors. Separating/classifying events.
Week 3:
19: Repeating midway, including extended examples. Start 1st project (for Friday the 30th).
20: Bayes theorem. Work on 1st project.
23: No class! Working on 1st project.
Week 4:
26: Sascha teaching class. Working on 1st project.
27: Likelihood and more fitting.
30: Hand out 2nd problem set (for Monday the 10th).
Week 5:
3: Multi-Variate Analysis (MVA). Fisher and ROOT's TMVA.
4: Limits and confidence intervals. Evaluation of 1st projects.
7: Comments on fitting. Kolmogorov-Smirnov test.
Week 6:
10: Start 2nd project (for Tuesday the 25th).
11: Work on 2nd project (I'm gone 14:00-15:15).
14: Work on 2nd project.
Week 7: (efterÄrsferie/project2)
17:
18:
21:
Week 8:
24: Calibration and use of control channels. Blind analysis and when to reject data points.
25: Evaluation of 2nd projects (with presentations)
28: Free!
Week 9:
31: Free!
1: Free!
3: Exam given (posted on course webpage in the morning).
4: Exam to be handed in.




Course weekly pages (link to lectures and exercises):
The course will generally consist of (short) lectures followed by a computer problem solving session with program examples, which illustrates some of the points made in the lectures.

Here is a link to the initial questionaire.

  • Week 1 (5-9 Sep): Central Limit Theorem, Mean, Width, Correlation(s), Distributions/PDF, and some error propagation (chapters 1-4).
  • Week 2 (12-16 Sep): ChiSquare fitting, Random numbers, Hypothesis testing (chapters 6 and 8 and PDG notes).
  • Week 3 (19-23 Sep): Repetition, Bayes theorem, Project 1.
  • Week 4 (26-30 Sep): Finishing Project 1 and Likelihood fitting.
  • Week 5 (3-7 Oct): Fisher's exact test and linear discriminant, student's t-distribution, confidence intervals.




    Problem sets, projects, and exam set:
    During the course there will be problem sets to be solved, projects to be carried out, and a final take-home exam to be handed in, all of which can (in time) be found below:
  • Problem Set 1 (Mon. 12th - Tues. 20th September).
  • Project 1 (Mon. 19th - Fri. 30th September).
  • Problem Set 2 (Fri. 30th September - Mon. 10th October). Data set with Tibetan skull measures.
  • Project 2 (Mon. 10th - Tues. 25th October).
  • Final take-home exam (Thur. 3rd - Fri. 4th November).


    Notes:
    In addition to the text book and other litterature, some notes will be used during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • The Data Analysis Brief Book.
  • Statistics resources.

    Links:
  • Mozilla Design Challenge with data from Firefox usage..
  • Blog on how to use crime rates for predicting taxi demand!.