Small project solution description - Applied Machine learning
Mia-Louise Nielsen
qdl889
miax2973@gmail.com


##################################################################
## Classification
##################################################################

## Solution 1: 
Library: Scikit-learn
Algorithm: k nearest neighbours
Key HP values: metric=manhattan, n_neighbors=30, weights=distance
HP optimization: Used grid search to test different configurations for the three key HP speciified above (metric, n_neighbors, and weights)
Performance: roc_auc=0.938, cross entropy=0.324 (both are best of 5-fold cross validation)
Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1])
Final model training time: 0.526 s
Own evaluation: The result seems very reasonable and the training process is short. All in all, a fairly good model.


## Solution 2:
Library: LightGBM
Algorithm: Gradient boost - classifier
Key HP values: learning_rate=0.07367303537415874, max_bin=246, max_depth=19, num_leaves=85
HP optimization: Used randomized search to test different configurations of the HP mentioned above. 
Performance: roc_auc=0.982, cross entropy=0.136 (both are best of 5-fold cross validation)
Final model training time: 3.616 s
Own evaluation: The performance is better than than the above result from KNN (roc_auc increased with 0.044 (4.7%)), however, the training time incresed by 687%. The absolute training time is still very resonably, so all in all a very good model.


## Solution 3:
Library: Keras (Tensorflow)
Algorithm: Neural Network: Dense1(Relu), Dropout, Dense2(Relu), Dropout, Dense3(Softmax - 2 neurons)
Key HP values: batch_size=409, dropout_rate=0.25231103108297215, learning_rate=0.022173754635351463, neurons1=68, neurons2=107
HP optimization: Manually tested a few different architetures (number of layers, activation functions), followed by randomized search to tune batch_size, learning_rate, dropout rate and number of neurons in Dense1 and Dense2
Performance: categorical_crossentropy=2.4897, roc_auc=0.961 of (best of 5-fold cross validation)
Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1])
Final model training time: 246 s
Own evaluation: Although not quite as good as the gradient boost model from solution 2 (~2% lower area under the roc curve), it is still a very high performing model. However, the training time is significantly larger, though it is still reasonable for this amount of data. 


##################################################################
## Regression
##################################################################

## Solution 1: 
Library: Scikit-learn
Algorithm: BayesianRidge
Key HP values: alpha_1=1.95e-05, alpha_2=6.71e-06, lambda_1=9.85e-05, lambda_2=5.84e-05
HP optimization: Used randomized search to tune the four HP specified above
Performance: MAE=25162 (best of 5-fold cross validation)
Final model training time: 0.345 s
Own evaluation: This seems to be a great model - high performance and low training time.


## Solution 2:
Library: LightGBM
Algorithm: Gradient boost - regressor
Key HP values: learning_rate=0.097, max_bin=161, max_depth=8, num_leaves=32
HP optimization: Used randomized search to test different configurations of the HP mentioned above. 
Performance: MAE=68760 (best of 5-fold cross validation)
Final model training time: 0.950 s
Own evaluation: Although the performance is a bit worse than in solution 1 and the training time is larger, this is still a quite good model. The absolute training time is still very reasonable and the performance seems good.


## Solution 3:
Library: Keras (Tensorflow)
Algorithm: Neural Network: Dense1(Relu), Dropout, Dense2(Relu), Dropout, Dense3(ReLu), Dropout, Dense4(Relu), Dropout, Dense5(ReLu), Dropout, Dense6(no activation - 1 neuron)
Key HP values: 
HP optimization: Manually tested a few different architetures (number of layers, activation functions), followed by randomized search to tune batch_size, learning_rate, dropout rate and number of neurons in individually in the first 5 Dense layers
Performance: MAE=10756 (best of 5-fold cross validation)
Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1])
Final model training time: 71.0 s 
Own evaluation: Although the training time is longer than for the previous two models (with a factor of ~200 and ~75), the performance is increased as well (the mean absolute error is descreased by a factor of 2.3 and 6.4, respectively). All in all, a very quite good model, considering the training time is still resonable.


##################################################################
## Clustering
##################################################################

## Solution 1: 
Library: Scikit-learn
Algorithm: Kmeans
Key HP values: n_clusters=3, n_init=20
HP optimization: Tried different values of n_cluster and n_init
Performance: roc_auc=0.635
Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1])
Final model training time: 3.18 s
Own evaluation: The model was able to identify three different clusters within the data and the training time is short, however, the area under the roc curve is only 0.635 if the model is evaluated based on ablility to classify electrons/non-electrons, i.e., not a great performing model.


## Solution 2: 
Library: Scikit-learn
Algorithm: Gaussian mixture models
Key HP values: n_components=5, covariance_type='diag'
HP optimization: Tested different numbers of clusters (n_components) and different settings for covariance_type
Performance: roc_auc=0.902
Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1])
Final model training time: 6.43 s
Own evaluation: The training time is short and the model was able to identify 5 different clusters within the data. The area under the roc curve is found to be 0.902 when evaluated based on ablility to classify electrons/non-electrons, which I think is supprisingly good (increased by 42% compared to Kmeans (solution 1)). 


## Solution 3: 
Library: Scikit-learn
Algorithm: Meanshift
Key HP values: bandwidth=0.4
HP optimization: Tested a few different values for the bandwidth
Performance: roc_auc=0.424
Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1])
Final model training time: 6958s
Own evaluation: The model was able to identify 6 different clusters within the data, however, the training time was unreasonably long, which makes the model's HP harder to optimize properly. The area under the roc curve is found to be 0.424 when evaluated based on ablility to classify electrons/non-electrons. It might be possible to improve the result by spending more time tuning HP, but considering the high training time, the model is not a sensible choice for this data.