Applied Statistics - From data to results (Fall 2009)

General information:
Teacher: Troels C. Petersen (NBI High Energy Physics (HEP)) (petersennbi.dk).
When: Monday 8-12, Tuesday 13-17, and Friday 8-12 (Week Group B).
Where: Monday Aud A + Cc3, Tuesday Aud M + D315, and Friday Aud M + Cc3, but...
Period: Blok 1 (31st of August - 23th of October 2009), 8 weeks.
Evaluation: Problem solving during class (25%), Project (25%), Take-home exam (50%).
Exam: Take-home (24 hour) exam given Thursday the 29th of October 2009.
Censur: Internal evaluation (following the Danish 7-step scale)
Credits: 7.5 ECTS (i.e. 1/8 academic years work).
Level: Intended for students at 3rd - 5th year of studies (but everybody are welcome!).
Prerequisites: Simple mathematics and basic programming (any language).
Programs used: Simple C++ and the CERN software ROOT.
Text book: Statistical Data Analysis by Glen Cowan.
Literature: Data Reduction and Error Analysis, Philip R. Bevington.
Pensum/Curriculum: Chapters 1-7 and chapter 9 in G. Cowan's "Statistical Data Analysis" (about 86 pages).
Outline: Graduate statistics course giving an advanced introduction to data analysis.
Course format: Shorter lectures followed by computer exercises and discussion.
Key words: PDF, Uncertainties, Correlation, Chi-Square, Likelihood, Fitting, Monte Carlo.
Language: Preferably Danish (English if requested).


Final Exam Set (in Danish), and Final Exam Set (in English).
Questionnaire about the course.






Course format:
The course will generally consist of short lectures followed by a computer problem solving session with program examples, which illustrates some of the points made in the lectures. They are either C++ programs (using ROOT routines) or ROOT macros (technicalities on setting up and running these programs), a description of which can be found in the folloing pages:
  • Week 1.
  • Week 2.


    Notes:
    In addition to the text book and other litterature, some notes will be used during the course:
  • PDG notes on Probability.
  • PDG notes on Statistics.
  • PDG notes on Monte Carlo Techniques.
  • Questionnaire before course and Quiz.
  • Questionnaire after course.
  • Quick Introduction to Linux, Emacs, C++, and ROOT.
  • The Data Analysis Brief Book.
  • Statistics resources.
  • Fisher's Exact Test on tea drinking lady.
  • Barnard's Exact Test compared to Fisher's.
  • .


    Problem sets, projects, and exam set:
    During the course there will be problem sets to be solved, projects to be carried out, and a final take-home exam to be handed in, all of which can be found below:
  • Problem Set 1 (in Danish), and Problem Set 1 (in English).
  • Problem Set 2 (in Danish), and Problem Set 2 (in English).
               Associated data sets on Tibetan skulls and Leader lifespans.
  • Exam Set (in Danish), and Exam Set (in English).


    Detailed course description:
    The course will give students an introduction to and a basic knowledge of statistics. The emphasis will be put on practical data analysis, and examples and the use of computers will take the place of mathmatical proofs.
    At the end of the course, the students should be able to analyze any given data sample, and be able to efficiently extract the information contained therein, which involves:
  • knowing the concept of PDFs and the most basic PDFs used in science.
  • propagating uncertainty also when correlations are involved.
  • setting up and executing statistical tests.
  • performing Chi-Square and Maximum Likelihood fits.
  • the use of simulation for planning experiments.
    The course will cover: Basics of statistics, Distributions - Probability Density Functions, Error propagation, Correlations, Monte Carlo techniques, Statistical tests, Parameter estimation - philosophy and methods of fitting data, Chi-Square and Maximum Likelihood fits, Simulation and planning of an experiment, Confidence intervals, Data mining and techniques for separation, and The power and limit of statistics.

    The best grade (12) will be given for mastering all the above concepts, while the lowest passing grade (2) will be given for the most basic understanding of statistics and data analysis (PDF, uncertainties, correlation, and Chi-Square fit).