Applied Statistics - From data to results (Fall 2010)

"Coincidences, in general, are great stumbling blocks in the way of that class of thinkers who have been educated to know nothing of the theory of probabilities (and statistics) - that theory to which the most glorious objects of human research are indebted for the most glorious of illustration."
[Edgar Allan Poe, "The Murders in the Rue Morgue"]
"This course overqualified me for a course on scientific computing at Harward the following Summer."
[Dennis Christensen (2009 course)]
"I recommended this course to everyone I know."
[Pernille Yde (2009 course)]

"I don't think that you can rightly call yourself a physicist, if you have not had a course of this type."
[Bo Frederiksen (2010 course)]

General information:
Teacher: Troels C. Petersen (NBI High Energy Physics (HEP)) (petersennbi.dk).
When: Monday 9-12, Tuesday 13-17, and Friday 9-12 (Week Group B).
Where: Monday Aud M + Cc3, Tuesday Aud M + Cc3, and Friday Aud M + Cc3.
Period: Blok 1 (6th of September - 29th of October 2010), 8 weeks.
Evaluation: Problem solving during class (20%), Projects (20%), Take-home exam (60%).
Exam: Take-home (24 hour) exam given Thursday the 28th of October 2010 at 8:15.
Censur: Internal evaluation (following the Danish 7-step scale)
Credits: 7.5 ECTS (i.e. 1/8 academic years work).
Level: Intended for students at 3rd - 5th year of studies and new Ph.D. students.
Prerequisites: Simple mathematics and basic programming (any language).
Programs used: Simple C++ and the CERN software ROOT.
Text book: Roger Barlow: Statistics: A guide to the use of statistics.
Additional litterature: Philip R. Bevington: Data Reduction and Error Analysis.
Glen Cowan: Statistical Data Analysis.
Pensum/Curriculum: The course curriculum can be found here.
Outline: Graduate statistics course giving an advanced introduction to data analysis.
Course format: Shorter lectures followed by computer exercises and discussion.
Key words: PDF, Uncertainties, Correlation, Chi-Square, Likelihood, Fitting, Monte Carlo.
Language: Danish (English if requested).







Course format and outline:
The course will generally consist of (short) lectures followed by a computer problem solving session with program examples, which illustrates some of the points made in the lectures. They will usually be ROOT routines (see technicalities on setting up and running ROOT programs).

The course plan regarding reading, problem sets, projects, and final exam is given roughly below. Note that it is only an outline, and that it may change at any given time! For more information about the course format, outline, expectations, and other issues, here are a few slides.

  • Week 1 (6-10 Sep): Central Limit Theorem, Mean, Width, Correlation(s), Distributions/PDF, and some error propagation (chapters 1-4).
  • Week 2 (13-17 Sep): Error propagation (chapter 4), Monte Carlo (Cowan 3.1-3.3), and Chi2 intro. 1st problem set (Fri 17 - Tue 21).
  • Week 3 (20-24 Sep): Estimators, Maximum Likelihood, and Chi-Square fits (chapter 5-6). 1st project (Fri 24 - Fri 1).
  • Week 4 (27-1 Sep/Oct): Chi-Square evaluation and Hypothesis testing (chapter 6.4 and 8).
  • Week 5 (4-8 Oct): Hypothesis testing (continued) and event classification (chapter 8). 2nd problem set (Thu 7 - Tue 12).
  • Week 6 (11-15 Oct): Repetition of curriculum and potential catch ups (Mon+Tue). 2nd project (Tue 12 - Fri 22).
  • Week 7 (18-22 Oct): Working on 2nd project. I'm gone this week!
  • Week 8 (25-29 Oct): Multivariate analysis (not curriculum), repetition, and questions. Final Exam (Thu 28 - Fri 29).




    Notes:
    In addition to the text book and other litterature, some notes will be used during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • Quick Introduction to Linux, Emacs, C++, and ROOT.
  • The Data Analysis Brief Book.
  • Statistics resources.
  • Fisher's Exact Test on tea drinking lady.
  • Barnard's Exact Test compared to Fisher's.


    Problem sets, projects, and exam set:
    During the course there will be two problem sets to be solved, two projects to be carried out, and a final take-home exam to be handed in, all of which can be found below (in due time):
  • Problem Set 1 (Thu 16 - Tues 21 Sep).
  • Project 1 (Fri 24 - Fri 1 Sep/Oct).
  • Problem Set 2 (Thu 7 - Tues 12 Oct). Data set with Tibetan skull measures.
  • Final Exam (Thu 28 - Fri 29 Oct).



    Detailed course description:
    The course will give students an introduction to and a basic knowledge of statistics. The emphasis will be put on practical data analysis, and examples and the use of computers will take the place of mathmatical proofs.
    At the end of the course, the students should be able to analyze any given data sample, and be able to efficiently extract the information contained therein, which involves:
  • knowing the concept of PDFs and the most basic PDFs used in science.
  • propagating uncertainty also when correlations are involved.
  • setting up and executing statistical tests.
  • performing Chi-Square and Maximum Likelihood fits.
  • the use of simulation for planning experiments.
    The course will cover: Basics of statistics, Distributions - Probability Density Functions, Error propagation, Correlations, Monte Carlo techniques, Statistical tests, Parameter estimation - philosophy and methods of fitting data, Chi-Square and Maximum Likelihood fits, Simulation and planning of an experiment, Confidence intervals, Data mining and techniques for separation, and The power and limit of statistics.

    The best grade (12) will be given for mastering all the above concepts, while the lowest passing grade (2) will be given for the most basic understanding of statistics and data analysis (PDF, uncertainties, correlation, and Chi-Square fit).