Small project solution description - Applied Machine learning Mia-Louise Nielsen qdl889 miax2973@gmail.com ################################################################## ## Classification ################################################################## ## Solution 1: Library: Scikit-learn Algorithm: k nearest neighbours Key HP values: metric=manhattan, n_neighbors=30, weights=distance HP optimization: Used grid search to test different configurations for the three key HP speciified above (metric, n_neighbors, and weights) Performance: roc_auc=0.938, cross entropy=0.324 (both are best of 5-fold cross validation) Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1]) Final model training time: 0.526 s Own evaluation: The result seems very reasonable and the training process is short. All in all, a fairly good model. ## Solution 2: Library: LightGBM Algorithm: Gradient boost - classifier Key HP values: learning_rate=0.07367303537415874, max_bin=246, max_depth=19, num_leaves=85 HP optimization: Used randomized search to test different configurations of the HP mentioned above. Performance: roc_auc=0.982, cross entropy=0.136 (both are best of 5-fold cross validation) Final model training time: 3.616 s Own evaluation: The performance is better than than the above result from KNN (roc_auc increased with 0.044 (4.7%)), however, the training time incresed by 687%. The absolute training time is still very resonably, so all in all a very good model. ## Solution 3: Library: Keras (Tensorflow) Algorithm: Neural Network: Dense1(Relu), Dropout, Dense2(Relu), Dropout, Dense3(Softmax - 2 neurons) Key HP values: batch_size=409, dropout_rate=0.25231103108297215, learning_rate=0.022173754635351463, neurons1=68, neurons2=107 HP optimization: Manually tested a few different architetures (number of layers, activation functions), followed by randomized search to tune batch_size, learning_rate, dropout rate and number of neurons in Dense1 and Dense2 Performance: categorical_crossentropy=2.4897, roc_auc=0.961 of (best of 5-fold cross validation) Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1]) Final model training time: 246 s Own evaluation: Although not quite as good as the gradient boost model from solution 2 (~2% lower area under the roc curve), it is still a very high performing model. However, the training time is significantly larger, though it is still reasonable for this amount of data. ################################################################## ## Regression ################################################################## ## Solution 1: Library: Scikit-learn Algorithm: BayesianRidge Key HP values: alpha_1=1.95e-05, alpha_2=6.71e-06, lambda_1=9.85e-05, lambda_2=5.84e-05 HP optimization: Used randomized search to tune the four HP specified above Performance: MAE=25162 (best of 5-fold cross validation) Final model training time: 0.345 s Own evaluation: This seems to be a great model - high performance and low training time. ## Solution 2: Library: LightGBM Algorithm: Gradient boost - regressor Key HP values: learning_rate=0.097, max_bin=161, max_depth=8, num_leaves=32 HP optimization: Used randomized search to test different configurations of the HP mentioned above. Performance: MAE=68760 (best of 5-fold cross validation) Final model training time: 0.950 s Own evaluation: Although the performance is a bit worse than in solution 1 and the training time is larger, this is still a quite good model. The absolute training time is still very reasonable and the performance seems good. ## Solution 3: Library: Keras (Tensorflow) Algorithm: Neural Network: Dense1(Relu), Dropout, Dense2(Relu), Dropout, Dense3(ReLu), Dropout, Dense4(Relu), Dropout, Dense5(ReLu), Dropout, Dense6(no activation - 1 neuron) Key HP values: HP optimization: Manually tested a few different architetures (number of layers, activation functions), followed by randomized search to tune batch_size, learning_rate, dropout rate and number of neurons in individually in the first 5 Dense layers Performance: MAE=10756 (best of 5-fold cross validation) Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1]) Final model training time: 71.0 s Own evaluation: Although the training time is longer than for the previous two models (with a factor of ~200 and ~75), the performance is increased as well (the mean absolute error is descreased by a factor of 2.3 and 6.4, respectively). All in all, a very quite good model, considering the training time is still resonable. ################################################################## ## Clustering ################################################################## ## Solution 1: Library: Scikit-learn Algorithm: Kmeans Key HP values: n_clusters=3, n_init=20 HP optimization: Tried different values of n_cluster and n_init Performance: roc_auc=0.635 Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1]) Final model training time: 3.18 s Own evaluation: The model was able to identify three different clusters within the data and the training time is short, however, the area under the roc curve is only 0.635 if the model is evaluated based on ablility to classify electrons/non-electrons, i.e., not a great performing model. ## Solution 2: Library: Scikit-learn Algorithm: Gaussian mixture models Key HP values: n_components=5, covariance_type='diag' HP optimization: Tested different numbers of clusters (n_components) and different settings for covariance_type Performance: roc_auc=0.902 Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1]) Final model training time: 6.43 s Own evaluation: The training time is short and the model was able to identify 5 different clusters within the data. The area under the roc curve is found to be 0.902 when evaluated based on ablility to classify electrons/non-electrons, which I think is supprisingly good (increased by 42% compared to Kmeans (solution 1)). ## Solution 3: Library: Scikit-learn Algorithm: Meanshift Key HP values: bandwidth=0.4 HP optimization: Tested a few different values for the bandwidth Performance: roc_auc=0.424 Pre-processing: Scaled the data using MinMaxScaler (values scaled to the interval [0,1]) Final model training time: 6958s Own evaluation: The model was able to identify 6 different clusters within the data, however, the training time was unreasonably long, which makes the model's HP harder to optimize properly. The area under the roc curve is found to be 0.424 when evaluated based on ablility to classify electrons/non-electrons. It might be possible to improve the result by spending more time tuning HP, but considering the high training time, the model is not a sensible choice for this data.