Advanced Methods in Applied Statistics 2016
Lecturer: D. Jason Koskinen
Email: koskinen (at) nbi.ku.dk
Basic Information
- Block 3 - Timetable A of the 2016 academic
calendar
- Tues 08:00 - 12:00 and Thurs 08:00- 12:00 & 13:00 - 15:00
- Actual
- 08:00 - 08:30 student study time for both days
- 08:30 - 09:00 Q&A or discussion with teacher in Aud. M
- 09:00+ lecture on new material
- Auditorium M at the Blegdamsvej campus
- Odd-numbered classes are 4-hours while even-numbered consist of 2
blocks of 4-hours.
- Classes will be composed of ~20-30% lecture and demonstrations
followed by exercise
- While assignments, projects, and exercises can be done in the
programming language of the students choice, the examples and
demonstrations will be mainly in Python and/or scientific packages
thereof, i.e. SciPy, PyROOT, etc.
- Required text or textbooks: None
Evaluation
- Oral
presentation and 1-page summary (10%)
- ~8-9 minute summary presentation. Plan on ~6 slides if you are doing
a PowerPoint-type presentation.
- Can work alone or in groups of up to 3.
- A single 1-page summary including any and all group members names.
- Presentation does NOT have to be given by all group members.
- Be sure to put down which article you are using here
to avoid duplication
- Example presentation
on Finite
Monte Carlo article
- Other possible articles
- Frequency Difference Gating: A Multivariate Method for Identifying
Subsets That Differ Between Samples (article)
- Probability binning comparison: a metric for quantitating
multivariate distribution differences (article)
- FIREFLY MONTE CARLO: EXACT MCMC WITH SUBSETS OF DATA (article)
- This is just a small sample. Find something related, interesting,
or applicable to your area of research.
- The 1-page summary is due
via email on March 9 by 17:00 (5pm) CET.
- Presentations will be selected at random and begin during class time
on March 10. At the discretion of the Lecturer and if needed, some
presentations will be postponed for a later date.
- If you have any questions or concerns email Jason
- Graded
problem sets (15%)
- Problem set 1 (5%)
- Deadline has now passed
- Grades have been posted on Absalon
- Problem set 2 (10%)
- Will be assigned sometime between March 7 and March 17
- Due: Friday April 8 at
16:00 Copenhagen time via email to Jason
- Problem Set 2 assignment
- Solution(s) to Problem Set 2
- Project
(25%)
- Similar to the oral presentation, this project focuses on using a
method or statistical treatment that is preferably related to your
field of research that you or your group select. Unlike the oral
presentation, the project includes not just understanding and
explaining the method, but also using it on a some appropriate data
set of your own choosing.
- Can be done alone or in groups of up to 3 people
- The only hand-in is a 4-6 page written report. You can submit the
code as well if you would like.
- Due: Friday April 8 at
16:00 Copenhagen time via email to Jason
- Final
exam (50%)
- Must work on your own!
- Take home exam
- 28 hour between start and submission
- Start at 08:00 on MONDAY
April 11
- Submit
by 12:00 on Tuesday April 12
- If for some reason you are absolutely positive that there is no
way you can do a 28-hour take home exam from April 11 to 12, let
Jason know immediately.
- The exam will be very similar to problem set 2 and here is the
simplest of outlines of what the exam
may look like
- Here are two extra practice problems
similar to what will be on the exam for those
- Extra Credit (+2% to final course grade based on a 1-100% scale)
- 2016 NCAA Men's Basketball Bracket submission due by 21:00 on March
17
- This is NOT a requirement or obligation for the course
- Extra Credit Outline
Course Syllabus
The outline is a rough sketch of the
course material, and is 100% likely to change throughout the course. Even
so, we will absolutely cover the
following topics which may require additional software support:
- Multivariate analysis (MVA) techniques including Boosted Decision
Trees (BDTS)
- The MultiNest bayesian inference tool
- Basis splines
- Markov Chain Monte Carlo
- Likelihood minimization techniques
Slack Channel for communicating and sharing comments, question, plots,
solutions, code, etc... Instructions are
- Join the slack-team AdvancedMethodsKU2016
- sign in with your "******@alumni.ku.dk" - mail
- Choose password and username
- Profit
Class notes will be posted here:
Class 0 - Pre_Course, attendance is not required
Class 1 – Start
Class 2 - Monte Carlo Simulation
- Note:
LIGO is announcing something important with a webcast starting at 16:30
to be shown in Aud. M (since all they can observe is gravitational
waves, it's probably gravitational waves). Also, Jason has an
unavoidable scientific 'thing' at 15:30.
- Random number generators
- Simple Monte Carlo
- Lecture 2
- Extra
exercise about Merged Binning Combinatorics
Class 3 - Method of Least Squares
- Today's lecture is more analytic and math than normal, but should be
used as a reference
- Lecture 3
- Some useful links
Class 4 - Likelihoods and Numerical Minimization Fitting
Class 5 - Bayesian Statistics Introduction
- Lecture 5
- Time to get 2 software packages ready for later in the course and very
likely Lecture 6
- Markov Chain Monte Carlo
- MultiNest (install packages available for at least python, R, and
Matlab)
Class 6 - Markov Chain Monte Carlo
Class 7 - Parameter Estimation
Class 8 - Hypothesis Testing
Class 9 - Splines
Class 10 - Oral presentations (in class) & Non-parametric Tests
Class 11 - Multi-Variate Analysis technique (MVA)
- Last group of oral presentations
- Boosted Decision Trees
- Lecture 11
- Exercise 1 python TMVA (see 2017 course webpage)
- Exercise 2 python TMVA (see 2017 course webpage)
- Data
- Exercise 1 (training signal,
training
background, testing signal,
testing background)
- Exercise 2 (16 variable file)
- The first column is the index, hence there are 17 'variables', but
the index variable only for book keeping and has no impact on
whether an event is signal or background.
- Every even row is the 'signal' and every odd row is the
'background'. Thus, there are two rows for each index in the first
column: the first is the signal and the second is the background.
[Format is odd, but I got it from a colleague].
Class 12 - Data Processing and Signal Processing
- To prepare for the class make sure that a wavelet package is available
- Python - "pip install PyWavelets"
- Matlab - http://se.mathworks.com/products/wavelet/
- Lecture 12 (by Dr. James Monk)
- We will be covering wavelet and Hough
transform
Class 13 - Rare Events
Class 14 - Nested Sampling in Bayesian Inference
- Lecture 14
- Note that external packages for conducting nested sampling, e.g.
MultiNest, are necessary
- Jason's pymultinest code for the
exercises
Class 15 - Background subtraction and sPlots
Class 16 - Review
Extra Projects of a more difficult nature, for those who want something
more challenging.