Applied Machine Learning 2023 - Final Project

"As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that's going to happen. And I believe that the inability to explain why something is going to happen is why I struggle to call 'data science' a science."
[Bill Schmarzo, Author of "Big Data: Understanding How Data Powers Big Business"]

Project description:
The final project is a machine learning project on what-ever-you-want! Analyse data from a research group at NBI, figure out, if the stock market can be figured out, test if you can beat your friends in games with AI, try to estimate the selling price of houses, challenge the world on Kaggle, etc. Now, we highly encouraged you to find your own data, and there are few limits to what we will accept. And it does not have to be physics related! However, it should be sizable and have some levels of complexity in it, as this is one of the points of evaluation. But it may be numbers, text, images, sound, ???, or a combination of any of these. The only really strict requirements are, that you use Machine Learning algorithms on some sizable data, and that you work in a group.
Groups should ideally consist of 3-5 persons. However, we also allow for smaller groups (1-2 persons), which will have the same deadline, but who will present their projects towards the end of the exam. We highly encourage groups, as this is the ideal way of learning, and how most project work functions in both academia and industry. Click here to register your group (don't worry - not final).

The aim/target of the ML algorithm developed is secondary and also entirely up to you. However, you may want to consult with us/others regarding the feasibility of your project, though it is not a requirement, that you have fully succeeded at the end.

You should submit the project via Absalon by 22:00 on Tuesday the 13th of June 2023. Exam presentations are the following two days (14th and 15th of June).

Existing data sets:
In case you do not have or do not want to find your own data (with all of its related pitfalls), below we provide a short list of possible data sets that we have in store, and which can be transfered to you "immediately":
  • Identifying ancient "insoluables" from images in Peruvian ice cores (multiclassification, medium, TP)
  • ATLAS V0 particle identification (classification, easy, TP)
  • ATLAS electron energy reconstruction with Convoluted Neural Networks (regression, medium-hard, TP)
  • IceCube neutrino direction reconstruction with Graph Neural Networks (regression, hard, TP)
  • Classification of and regression on astronomical objects (classification and regression, easy-medium, CS)
  • Generating "Synthetic Fault Surfaces”, i.e. how earthquake break fault surfaces in complex patterns (generative, ?, CS)
  • Using nature-inspired algorithms to find large-scale structure in the Early Universe (classification, ?, CS)
  • Clustering of data from Coma Cluster (no pun intended) treasure dataset (clustering, ??)
  • This list will be updated!!!!

    The list may change over time and further information may be added. Until such a time, contact the "data set responsible" regarding details. In discussion with last year's students, the following is a compilation of experiences from last years final projects (a list of which can be found at the main course webpage).

    Solutions and submission:
    Your Final Project solutions should consist of two PDFs and code:
    1) A set of slides, which contain your work fit for a 15 minute presentation (i.e. about 15 slides) along with an appendix describing details.
    2) A project statement on individual contributions (e.g. "All participants contributed evenly").
    3) Code used for project (only for reference - not for evaluation!), preferably in .zip format.
    All members of a group should present. If for some reason this is not possible/wanted, then please simply state why in the project statement. To simplify matter, you should name your files as follows: FirstNamesOfGroupMembers_ProjectName.pdf, for example: TroelsAzzurraArnauThomas_StudentGradingRegression.pdf. And likewise for project statement and code.

    Evaluation:
    The final projects will be evaluated based on the following criteria:
  • Complexity of problem and depth of solution (incl. appendix)
  • Choice of methods and arguments behind
  • ML performance and own evaluation of it
  • Clarity of presentation
  • Implementation, technical details, optimisation, etc. (your appendix)
  • Ability to evaluate ML usage (your evaluations of the other presentations)
    You will all be presenting your projects on Wednesday the 14th and Thursday the 15th of June 2023 in an all-day presentation frenzy starting 9:00. We will do our best to ensure your comfortability, and at the same time ask you to evaluate the other projects (don't worry - we will not grade based on your evaluations). We request (and assume) that everybody will be there. If you for some reason can not be there the full day, write us with your conflicts and reasons, and we'll do our best to mend the program or reschedule exam. If your group has preferences to any of the two days, please write us.


    Last updated: 14th of April 2023 by Troels Petersen.