1.
Classification_HelenaBritze_SKLearnGBM.txt:
GradientBoostingClassifier from sklearn. HP tuned using GridSearch - learning_rate=0.075, max_depth=12, 140 trees, min_samples_split=600, min_samples_leaf=50. LR and trees were lowered after tuning to optimize further. SHAP for 15 best feature selection. CV involved in training. 

Classification_HelenaBritze_LightGBM.txt:
LGBMClassifier from lightgbm. HP tuned using GridSearch - max_depth=25, learning_rate=0.01, num_leaves=200, n_estimators = 700. Top 15 features selected with LGBM build-in feature selection. LGBM was used to compare its performance with sklearn GBM -> the performances seems to be alike when applied to validation set. 

Classification_HelenaBritze_MLP.txt:
Multi-Layer Perceptron NN classifier from sklearn. HP tuned with GridSearch. 15 best features selected by permutation importance. The performance does not seem to be as good as the tree-based methods.  


2.  
Regression_HelenaBritze_SKLearnGBM.txt:
GradientBoostingRegressor from sklearn. HP tuned with RandomizedSearch - n_estimators=150, min_samples_split=10, min_samples_leaf=5, max_depth=5. 
SHAP for 10 best feature selection. It seems to be a reasonably good model. 

Regression_HelenaBritze_KerasNN.txt:
Keras deep neural network model, activation func='relu', loss func=mean_absolute_error, and three hidden layers. 10 best features from sklearn SelectFromModel selection technique using Lasso regularized linear model. The performance could be improved - it was not as good as the tree-based model. Maybe improve by removing irrelevant features.


3.
Clustering_HelenaBritze_KMeans-pca.txt:
KMeans clustering from sklearn. Optimal K=4 clusters determined by elbow plot, silhouette method did not give clear global maximum. PCA was applied to find the best 25 features.

Clustering_HelenaBritze_KMeans-shap.txt:
KMeans clustering from sklearn. Best 25 features determined by SHAP in classification solution1. The optimal K=5 clusters determined by elbow plot and silhouette method.
Clustering not good - only one particle were put in cluster 3.

Clustering_HelenaBritze_GaussianMixture.txt:
GaussianMixture model from sklearn. K=5 clusters and 25 features by SHAP - compared to KMeans with SHAP features the GaussianMixture cluster proportions are very different.