Overall strategy was to produce a tree and an NN based solution for classification and regression, and use k-means for clustering, though this changed. 1. Classification_TroelsPetersen_SKLearnAlgo1.txt: Algorithm: SKLearn AdaBoostRegressor Key HP values: max_depth=4, Ntrees=300 HP optimisation: Tried 4 different HP values for 3 parameters (max_depth, Ntrees, min_entries_in_branch) Parameters in model: 89792 (540 kB) Loss function and value on validation set: 2.156 (binary cross entropy with 5-fold cross validation) Own evaluation: Well performing model, variable choice optimised with SHAP. 2: Regression_TroelsPetersen_XGBoost1.txt: Algorithm: XGBoost XGBClassifier Key HP values: Ntrees=450, HP optimisation: HP was tuned using RandomizedSearch Parameters in model: 345029 (1.3 MB) Loss function and value on validation set: 0.7924 (MAE of relative deviation = (P-T)/T, 20% for validation) Own evaluation: Reasonably good model, and 15 best variables (from build-in ranking) was clear. 3: Regression_TroelsPetersen_PyTorchNN1.txt: Algorithm: torch.nn Key HP values: Nhidden1=40, Nhidden2=50, LearningRate=Variable (from 0.001 to 0.1) HP optimisation: HP was tuned using Adam Parameters in model: 836333 (3.4 MB) Pre-processing: Scaled the input features using QuantileTransformer Loss function and value on validation set: 0.7683 (MAE of relative deviation = (P-T)/T, 20% for validation) Own evaluation: Great model, improving 2.5% on XGBoost model. 4: Clustering_TroelsPetersen_kNN-DidNotReallyWorkWell.txt: Algorithm: Sklearn AgglomerativeClustering Key HPs: distance_threshold=0.13 HP optimisation: Tested various values of n_clusters. Parameters in model: 2 (50 bytes) Pre-processing: Scaled the input features using RobustScalar Loss function and value on validation set: 3.141 (KL divergence) Own evaluation: Clusters were found, but I wasn't very happy with the model, and could not determine or even sense, what was "good".