Applied Machine Learning 2025 - Final Project
"As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that's going to happen. And I believe that the inability to explain why something is going to happen is why I struggle to call 'data science' a science."
[Bill Schmarzo, Author of "Big Data: Understanding How Data Powers Big Business"]
Project description:
The course exam is a final machine learning project on what-ever-you-want! Analyse data from a research group at NBI, figure out,
if the commodity market can predicted, test if you can beat your friends in games with AI, try to estimate the selling price
of houses, challenge the world on Kaggle, etc.
Now, we encouraged you to find your own data, and there are few limits to what we will accept. And it does not have
to be physics related! However, it should be sizable and have some levels of complexity in it, as this is one of the points of
evaluation. But it may be numbers, text, images, sound, ???, or a combination of any of these. Alternatively, we have a few exciting
projects with very interesting data that you can contribute to getting information out of.
The only really strict requirements
are, that you use Machine Learning algorithms on some sizable data, and that you work in a group.
Groups should ideally consist of 3-5 persons. However, we also allow for smaller groups (1-2 persons), which will have the same
deadline, but who will present their projects towards the end of the exam. We highly encourage groups, as this is the ideal way of
learning, and how most (all?) project (or all?) work functions in both academia and industry.
Click here to register your group
(don't worry - nothing is final until the day before the exam!).
The aim/target of the ML algorithm developed is secondary and also entirely up to you. However, you may want to consult with us/others
regarding the feasibility of your project, though it is not a requirement, that you have (fully) succeeded at the end.
You should submit the project via Absalon by 22:00 on Tuesday the 10th of June 2025.
Exam presentations are the following two days (11th and 12th of June), where each group gets "8 + 2*GroupSize" minutes.
We request (and assume) that everybody will be there (see below).
Existing data sets:
In case you do not have or do not want to find your own data (with all of its related pitfalls), below we provide a short list of
possible data sets that we have in store, and which can be transfered to you "immediately":
Determining Greenlandic ice sheet volume from radar measurements and satellite images (regression, easy-medium-hard, TP/Niccolo Maffezzoli)
Severe Adverse Events (SAE) prediction on Intensive Care Unit patient data from MIMIC-III project (sparse time series, easy-medium-hard, TP/Norman)
IceCube neutrino classification and direction reconstruction with Graph Neural Networks (regression, medium-hard (requires GPU), TP/Janni)
Determine (unsupervised) interaction type in new type of scilicon detector developed at Stanford and Buenos Aires (unsupervised, easy-medium, TP/S+SA)
This list will be updated!!!!
The list may change over time and further information may be added. Until such a time, contact the "data set responsible" regarding details.
In discussion with last year's students, the following is a
compilation of
experiences from last years final projects.
Solutions and submission:
Your Final Project solutions should consist of a PDF and code:
1) A set of slides, which contains: Your work for presentation (i.e. about 10-20 slides), and an appendix describing possible details (0-100 slides).
2) A project statement on individual contributions (e.g. "All participants contributed evenly" or list of who worked on what). Can be contained in 1).
3) Code used for project (only for reference - not for direct evaluation!), preferably in .zip format.
To simplify matter, you should name your files as follows:
FirstNamesOfGroupMembers_ProjectName.pdf,
for example:
TroelsDanielJanniNormanAayush_StudentGradingRegression.pdf.
All members of a group should present. If for some reason this is not possible/wanted, then please simply tell me and state why in the project statement.
Evaluation:
The final projects will be evaluated based on the following criteria:
Complexity of problem and depth of solution (incl. appendix)
Choice of methods and arguments behind
ML performance and own evaluation of it
Clarity of presentation and thus how much the class will learn from your presentation
Implementation, technical details, optimisation, etc. (your appendix)
Ability to evaluate ML usage (your evaluations of the other presentations)
For the two exam day presentation frenzy, we will do our best to ensure your comfortability, and at the same time ask you to evaluate
the other projects (don't worry - we will not grade based on your evaluations).
We request (and assume) that everybody will be there.
If you for some reason can not be there the full day, write us with your conflicts and reasons, and we'll do our best to find a solution,
mend the program or (worst case) reschedule exam. If your group has preferences to any of the two days, please write us.
Last updated: 3rd of April 2025 by Troels Petersen.