08:30 - 09:00 Q&A or discussion with Jason in the classroom
09:00 lecture on new material (not 09:05 or 09:15)
On Thursday there will very often be new material starting at
13:00
On Thursday it is very unlikely that any new material, lectures,
or review will happen after 16:00.
There are multiple locations depending on the day (timetable):
Tuesday is at øv - bib 4-0-17, Universitetsparken 1-3, DIKU
Thursday morning is at Aud 10, Universitetsparken 5, HCØ
Thursday afternoon is at Aud 06, Universitetsparken 5, HCØ
Odd-numbered classes are 4-hours while even-numbered consist of 2
blocks of 4-hours
Classes will be composed of ~20-30% lecture and demonstrations
followed by exercise
While assignments, projects, and exercises can be done in the
programming language of the students choice, the examples and
demonstrations will be mainly in Python and/or scientific packages
thereof, i.e. SciPy, PyROOT, etc.
Required text or textbooks: None
2016 Advanced Methods in Applied Statistics webpage
2017 Advanced Methods in Applied Statistics webpage
2018 Advanced Methods in Applied Statistics webpage
It is recommended, but not
required, to have at least reviewed the little sibling to this course,
i.e. "Applied Statistics - From data to results" which can be found here
Oral
presentation and 1-2 page summary(10%)
~10 minute summary presentation. Plan on ~7 slides if you are
doing a PowerPoint-type presentation.
Can work alone or in groups of up to 3
A 1-2 page summary including any and all group members names
I strongly encourage people to use LaTeX for the typesetting of
the written summary. For those who do not already have a style, I
would recommend trying the formatting style for submission to
journals published by the American Physical Society Downloadable
here
Presentation does NOT have to be given by all group members
Talk or email with Jason if you have questions about the
appropriateness of your article
Be sure to put down which article you are using
here to avoid duplication
Similar to the oral presentation, this project focuses on using a
method or statistical treatment that is nominally related to your
field of research that you or your group select. Unlike the oral
presentation, the project includes not just understanding and
explaining the method, but also using it on a some appropriate data
set of your own choosing.
Can be done alone or in groups of up to 3 people
The only hand-in is a 4-6 page written report. You can submit the
code as well if you would like.
The project should be formatted and written as if it were a
conference proceeding
I strongly encourage people to use LaTeX for the typesetting.
For those who do not already have a style, I would recommend
trying the formatting style for submission to journals published
by the American Physical Society Downloadable
here
Here is an
example of a fantastic write-up from 2017.
You may start working on this right now!!
Due: April 2 by 22:00 CET
Final
exam (40%)
Must work on your own!
Take home exam
28 hour between start and submission
Begins at 10:00 CET on April 4, 2019
The exam must be submitted by 14:00 CET on April 5, 2019
The exam will be similar to problem set 2
A handful of more intensive questions as opposed to numerous
short questions
While the exam will contain problems from any portion of the
course material, the focus will be more on topics in the latter
portion of the course
Here
are two extra practice problems similar to what has been on the
previous exams
The course is 100% likely to change once we begin, and future lectures
listed below serve as an outline. Even so, we are
very likely to cover the following topics which may require
additional software support:
Multivariate analysis (MVA) techniques including Boosted Decision
Trees (BDTs)
The MultiNest bayesian inference tool
Basis splines
Markov Chain Monte Carlo
Likelihood minimization techniques
Class notes will be posted here:
Class 0 - Pre-Course
Take a look before the class starts (optional)
Get a preview with the course Teaching Assistant (Jean-Loup Tastet) of
some software tools to install
From the "Not Normal: the uncertainties of scientific measurements" paper:
For the ambitious, create a 'toy monte carlo' of the sample and pair
distributions for the nuclear physics data in Sec. 2.A. For simplicity
assume that all the 'quantities' are gaussian distributed
Write functions where you can produce multiple gaussian
distributions to sample from and generate a sample of "12380
measurements, 1437 quantities, 66677 pairs".
Produce the z-distribution (using eq. 4) plot for just your toy
monte carlo and see if it matches a gaussian, exponential, student-t
distribution, etc...
Links to some to some of the presentations (2016,
2017,
2018)
The Boosted Decision Tree lecture will be covered on March 14 in the
afternoon due to the length of the excellent in-class student
presentations and follow-up discussions.
Class 11 - Divergence Between Distributions and Template Matching
(March 12)
Guest Lecture by Prof. Andrew "Andy" Jackson
Lecture notes on Kullback-Leiber (part 1) and Template Matching (part
2) (PDF, powerpoint)
Kullback-Leibler divergence as a way to compare the sameness (or
tension) of two distributions, also known as a 'measure of surprise' or
'relative entropy'.
The first column is the index, hence there are 17 'variables', but
the index variable only for book keeping and has no impact on
whether an event is signal or
background.
Every even row is the 'signal' and every odd row is the
'background'. Thus, there are two rows for each index in the first
column: the first is the signal and the second is the background.
[Format is odd, but I got it from a colleague].
Here is the solution data sets separated into two files (benign
and malignant) for the last
exercise of the lecture. Here is also the (python)
code that I used to establish the efficiency for all the
submissions from all the students
Class 13 - Nested Sampling, Bayesian Inference, and MultiNest (March
19)