Applied Machine Learning 2021 - Final Project

"As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that's going to happen. And I believe that the inability to explain why something is going to happen is why I struggle to call 'data science' a science."
[Bill Schmarzo, Author of "Big Data: Understanding How Data Powers Big Business"]
Project description:
The final project is a machine learning project on what-ever-you-want! Figure out, if the stock market can be figured out, test if you can beat your friends in games with AI, try to estimate the selling price of houses, challenge the world on Kaggle, etc. Now, we highly encouraged you to find your own data, and there are few limits to what we will accept. And it does not have to be physics related! However, it should be sizable and have some levels of complexity in it, as this is one of the points of evaluation. But it may be numbers, text, images, ???, or a combination of any of these. The only really strict requirements are, that you use Machine Learning algorithms on some sizable data, and that you work in a group.
Groups should ideally consist of 3-5 persons. However, given the current extraordinary circumstances, we also allow for smaller groups (1-2 persons), which will have the same deadline, but who will present their projects towards the end of the exam. We highly encourage groups, as this is the ideal way of learning, and how most project work functions in both academia and industry.

The aim/target of the ML algorithm developed is secondary and also entirely up to you. However, you may want to consult with us/others regarding the feasibility of your project, though it is no requirement, that you have fully succeeded at the end.

An overview groups and subjects can be found here: FinalProject_GroupsAndSubjectsOverview.pdf.

You should submit the project via Absalon by 22:00 on Tuesday the 15th of June 2021. Presentations are the following two days.

Existing data sets:
In case you do not have or do not want to find your own data (with all of its related pitfalls), below we provide a short list of possible data sets that we have in store, and which can be transfered to you "immediately":
  • Identifying ancient "insoluables" in icecores from images (multiclassification, medium, TP)
  • ATLAS V0 particle identification (classification, easy, TP)
  • ATLAS electron energy reconstruction with Convoluted Neural Networks (regression, medium-hard, TP)
  • Reconstruction of neutrino direction from timing information (regression, hard, TP)
  • Table of astronomical objects (classification and regression, AA)
  • Spectral analysis of measurements in SQL database (spectral analysis, AA)
  • Clustering of data from Coma Cluster (no pun intended) treasure dataset (clustering, AA)

    The list may change over time and further information may be added (others have data on Human motion action (regression, medium?) and Analysis of Tweets/text (text analysis, hard)). Until such a time, contact the "data set responsible" regarding details. In discussion with last year's students, the following is a compilation of experiences from last years final projects (a list of which can be found at the main course webpage).

    Solutions and submission:
    Your Final Project solutions should consist of two PDFs and code:
    1) A set of slides, which contain your work fit for a 15 minute presentation (i.e. about 15 slides) along with an appendix describing details.
    2) A project statement on individual contributions (e.g. "All participants contributed evenly").
    3) Code used for project (only for reference - not for evaluation!), preferably in .zip format.
    All members of a group should present. If for some reason this is not possible/wanted, then please simply state why in the project statement. To simplify matter, you should name your files as follows: FirstNamesOfGroupMembers_ProjectName.pdf, for example: TroelsAdrianoZoeCarlVadimRasmus_StudentGradingRegression.pdf. And likewise for project statement and code.

    Evaluation:
    The final projects will be evaluated based on the following criteria:
  • Complexity of problem and depth of solution (incl. appendix)
  • Choice of methods and arguments behind
  • ML performance and own evaluation of it
  • Clarity of presentation
  • Implementation, technical details, optimisation, etc. (your appendix)
  • Ability to evaluate ML usage (your evaluations of the other presentations)
    You will all be presenting your projects on Wednesday the 16th and Thursday the 17th of June 2021 in an all-day presentation frenzy starting 9:00. We will do our best to ensure your comfortability, and at the same time ask you to evaluate the other projects (don't worry - we will not grade based on your evaluations). We require (and will assume) that everybody will be there. If you for some reason can not be there the full day, write us with your conflicts and reasons, and we'll do our best to mend the program or reschedule exam.


    Last updated: 6th of June 2021 by Troels Petersen.