The idea was to implement both a NN and a boosted tree for classification and regression. Both NN's were implemented with tensorflow since I set it up to run on my GPU.
For the boosted trees, I wanted to try different algorithms. Also for classification and regression I wanted to NOT use scikit. All HP optimization was done with cross-validation and Optuna (thanks to Hævi for introducing us to that package). 
To find the best features I wanted to use SHAP values for all supervised algorithms, and for the clustering i wanted to go with the combined best 10 from the two classification algorithms.
Finally for the clustering i wanted to try at least two different algorithms. For all alogirithms the RobustScaler was applied to the data, which helps in particular with the NN's.

1: Classification_KasperNielsen_LightGBM.txt:
	Algorithm: LightGBMClassifier
	Key HP values: learning_rate=0.008729220139305117, num_leaves=174, max_depth=78, min_data=55
	HP optimisation: Optimized the parameters above with Optuna over 25 trials and 5 fold cross-validation
	Parameters in model: at max 1076*174=187224 (1076 is the number of estimators(trees))
	Loss function and value on validation set: 0.144483 (binary cross entropy(logloss) on 10% of training data)
	Own evaluation: This model is doing pretty well, gets a 94.5% acc on the validation set for the final training.

2: Classification_KasperNielsen_Tensorflow.txt:
	Algorithm: Tenforflow (sequential of dense layers)
	Key HP values: n_layers=2, n_units_l0=87, n_units__l1=87, adam_learning_rate=0.006612175544531648
	HP optimization: Optuna for 25 trials with 5 fold CV.
	Parameters in model: 10006
	Loss function and value on validation set: 0.1974 (binary crossentropy on 10% of the training data)
	Own evaluation: This model is also pretty good, gets 92.8% acc on the validation set - slightly worse than lgbm.

3: Regression_KasperNielsen_XGBoost.txt
	Algorithm: XGBoostClassifier
	Key HP values: max_depth=7, gamma=277.75, eta=0.06, min_child_weight=23.7
	HP optimization: Optuna for 25 trials with 5 fold CV.
	Parameters in model: at max 2^7*376=48128
	Loss function and value on validation set: 0.024 [mean((y_pred-y_true)/y_true) on 10% of the training data)]
	Own evaluation: This model is pretty good average relevative deviation of 2.5% is quite good.

4: Regression_KasperNielsen_Tensorflow.txt
	Algorithm: Tenforflow
	Key HP values: n_layers=1, n_units_l0=13, adam_learning_rate=0.0058963348477237415
	HP optimization: Optuna for 25 trials with 5 fold CV.
	Parameters in model: 222
	Loss function and value on validation set: 0.039 [mean((y_pred-y_true)/y_true) on 10% of the training data)]
	Own evaluation: This model is also pretty good, but not as good as XGB. An average relevative deviation of 3.9%.

5: Clustering_KasperNielsen_KMeans.txt
	Algorithm: sklearn.cluster.KMeans
	Key HP values: n_clusters=7
	HP optimization: plot of n_cluster vs inertia and look for the "elbow" (manual inspection for n_clusters in range [3,15])
	Parameters in model: 7? (coordinates of center of each cluster) 
	Loss function and value on validation set: 3.5e6 inertia
	Own evaluation: Assuming that class 0 is electrons and the other are non electrons the accuracy is 86.4%, which is actually quite good for such a simple model.

6: Clustering_KasperNielsen_GaussianMixture.txt
	Algorithm: sklearn.mixture.GaussianMixture
	Key HP values: n_clusters=9
	HP optimization: plot of n_cluster vs aic and bic (Akaike information criterion, bayesian information criterion) and look for minima (manual inspection)
	Parameters in model: 9
	Loss function and value on validation set: 2.4e6 bic
	Own evaluation: For this clustering algorithm it mixes electrons into different classes more, and thus one cannot say that one class is mostly electrons. Also a minimum of either AIC or BIC was not found.

7: Clustering_KasperNielsen_GaussianMixture2.txt
	Algorithm: sklearn.mixture.GaussianMixture
	Key HP values: n_clusters=3
	HP optimization: plot of n_cluster vs aic and bic (Akaike information criterion, bayesian information criterion) and look for minima (manual inspection)
	Parameters in model: 3
	Loss function and value on validation set: 3.6e6 bic
	Own evaluation: (This part is not really a seperate algorithm, just different setup of previous) Same algorithm as above but now only 3 clusters/classes. Gets 80% accuracy on training data if we assume class 0 is electrons. 90% i we sa class 0 and 2 are electrions.