13:22:09 From Haider Fadil Ali Hussein Al-Saadi : tree 13:22:41 From Haider Fadil Ali Hussein Al-Saadi : every forward and back prop 13:27:46 From Sofus Kjærsgaard Stray : 1/8th of that 13:28:12 From Haider Fadil Ali Hussein Al-Saadi : too long 13:28:13 From Haider Fadil Ali Hussein Al-Saadi : :D 13:28:29 From Rasmus Salmon : No 13:29:19 From Troels Christian Petersen : So the suggestion was 100 Gbit/s? 13:31:39 From Simon Hilbard : no 13:31:40 From Michael : With pytorch, yes 13:31:41 From Ann-Sofie Priergaard Zinck : No 13:31:43 From Rasmus Salmon : Not on this problem 13:32:03 From Aske R. : does a network outperform a direct physical connection between the storage and the gpus? 13:32:12 From Troels Christian Petersen : No poll - having a problem... 13:32:39 From Aske R. : for me it was a huge speed up 13:32:46 From Aske R. : when i switched to gpus 13:33:30 From Aske R. : if you don't move everything over 13:33:30 From Michael : If you do it in batches 13:33:47 From Michael : it is really slow, but necesary if too large data 13:35:17 From Aske R. : a recurrent would be harder right? 13:35:20 From Michael : Forward NN, would probably be 'easy' to convert 13:35:23 From Svend : When talking about the GPU size, is it the VRAM we are talking about 13:35:56 From Svend : ok thx 13:37:12 From Aske R. : doing batches with the gpu isn't really a problem for a recurrent network where you might be reading in batches anyways right? 13:37:40 From Rasmus Salmon : Theoretically would it be possible to have a system where "jumping" around the data is faster then just streaming 13:37:43 From Rasmus Salmon : ? 13:41:13 From Michael : More stable, less power and faster 13:41:36 From Haider Fadil Ali Hussein Al-Saadi : what do you mean by stable? 13:42:04 From Aske R. : twice every day 13:42:10 From Aske R. : yup 13:43:19 From Michael : couple million cpu? 13:43:21 From Sofus Kjærsgaard Stray : I have absolutely no clue 13:43:23 From Haider Fadil Ali Hussein Al-Saadi : 3000 playstations 13:43:39 From Haider Fadil Ali Hussein Al-Saadi : oO 13:43:53 From Simon Hilbard : what about PlayStations ? 13:44:51 From Carl-Johannes Johnsen : The big GPUs have ~5000 cores 13:44:52 From Sofus Kjærsgaard Stray : What exactly does it mean when a CPU fails? 13:45:03 From Haider Fadil Ali Hussein Al-Saadi : a crash I believe was the implication 13:45:13 From Sofus Kjærsgaard Stray : ah 13:48:11 From Michael : Handle parallel storage 13:49:15 From Troels Christian Petersen : Yes… 13:49:38 From Andy Anker : yes 13:49:38 From Aske R. : yes 13:49:38 From Sofus Kjærsgaard Stray : yes 13:49:39 From kristoffer : yes 13:49:40 From Simone Vejlgaard : yes 13:49:40 From Svend : me 13:49:41 From Haider Fadil Ali Hussein Al-Saadi : ys 13:49:41 From Simon Hilbard : yes 13:49:54 From Sofus Kjærsgaard Stray : I have but not from hospitals 13:50:00 From kristoffer : No experimental stuff.... 13:50:02 From Sofus Kjærsgaard Stray : there's lots of "random" data from Keggle 13:50:07 From Sofus Kjærsgaard Stray : kaggle* 13:51:07 From Andy Anker : I don't feel it is much different now.... 13:51:55 From Ann-Sofie Priergaard Zinck : yes 13:52:39 From Troels Christian Petersen : YEEEEES 13:52:39 From Simon Hilbard : yes 13:52:40 From Andy Anker : yes 13:52:41 From Simone Vejlgaard : yes 13:52:41 From Emy Alerskans : yes 13:52:50 From Ann-Sofie Priergaard Zinck : no 13:52:51 From Emy Alerskans : no 13:52:51 From kristoffer : no 13:52:53 From Aske R. : not if you are cern 13:52:57 From Troels Christian Petersen : yes… but not a lot! 13:53:22 From Aske R. : sounds expensive with that amount of data 13:54:13 From Haider Fadil Ali Hussein Al-Saadi : tape? 13:54:24 From Sofus Kjærsgaard Stray : magnetic tape I think 13:56:23 From Rasmus Salmon : I don't know if this is out of the scope of this class. How can something like folding@home work without super bad performance? 13:56:25 From Rasmus Salmon : https://en.wikipedia.org/wiki/Folding@home 14:00:41 From Rasmus Salmon : Yes 14:01:30 From Brian Vinter : https://datacenter.csc.fi/wp/about-lumi/ 14:04:42 From Aske R. : there is no reweighing needed in the small project data right? 14:07:42 From Sofus Kjærsgaard Stray : Small 14:07:57 From Katja : no 14:08:04 From Michael : With 2 components it got around 0.9 acc 14:08:05 From kristoffer : havent tryed 14:08:06 From Rasmus Salmon : No 14:08:08 From Aske R. : haven't gotten to it yet 14:08:11 From Andy Anker : I hope 14:08:29 From Sofus Kjærsgaard Stray : I've tried running a very simple BGM that's just 3 components 14:08:40 From Sofus Kjærsgaard Stray : and also 2 and 4 and 5 14:09:00 From Sofus Kjærsgaard Stray : Ah so try with, say, 10 variables 14:09:24 From Rasmus Salmon : How to calculate the acc of the clustering? 14:09:27 From kristoffer : Are you required to submit 9 algorithms (3 for each problem) to get full score? 14:09:27 From Michael : I used all 25 variables - 25 most significant from classification problem 14:10:42 From Sofus Kjærsgaard Stray : Oh, I removed the "Truth" variable from clustering. Are you not supposed to do that? 14:10:59 From Sofus Kjærsgaard Stray : Good 14:11:56 From Andy Anker : Are we supposed to do regression on the entire dataset, or should we somehow remove non-electrons first? 14:12:15 From Emy Alerskans : are you supposed to remove 'Truth' and 'p_truth_E' from all three problems (classification, regression, clustering)? 14:12:29 From Sofus Kjærsgaard Stray : Remove non-electrons 14:13:08 From Andy Anker : How? 14:13:23 From Aske R. : just only take the events where Truth == 1 14:13:24 From Andy Anker : Ahh ok 14:13:37 From kristoffer : @Troels Are you required to submit 9 algorithms (3 for each problem) to get full score? 14:13:39 From Sofus Kjærsgaard Stray : X.loc[train['Truth'] == 1] works if you use pd 14:13:42 From Aske R. : can be done easily if you have it set up in pandas dataframe 14:14:20 From Andy Anker : Thanks 14:14:52 From kristoffer : cool, thanks 14:15:34 From Emy Alerskans : are you supposed to remove 'Truth' and 'p_truth_E' from all three problems (classification, regression, clustering)? 14:16:03 From Aske R. : you need them as labels though 14:16:13 From Rasmus Salmon : Dont we need them as labels? 14:16:27 From Carl-Johannes Johnsen : Yes, but they should not be in the training data 14:16:53 From Sofus Kjærsgaard Stray : You can write X = train.drop([the stuff you dont want]) and then Y = train["Truth"] 14:17:08 From Dina Rapp : So, just to be clear - p_truth_E can’t be used in the clustering then ? 14:17:24 From Troels Christian Petersen : Yes, that is correct, Dina… they should not be used. 14:18:19 From Sofus Kjærsgaard Stray : Are we allowed to submit, say, 3 solutions of the same algortihm in clustering but where we use 3 different amounts of variables 14:18:33 From Sofus Kjærsgaard Stray : so kmeans on 5, 10, and 25 variables 14:19:18 From Andy Anker : At some point, you said, you will give an example of a description file. When will that be`? 14:19:44 From Andy Anker : Thanks 14:19:49 From Sofus Kjærsgaard Stray : I mean I want to show both kmeans vs BGM but I also want to show the results of using 5 vs 10 variables, but that would be 4 "results" which is too much. 14:20:21 From Sofus Kjærsgaard Stray : and I got BGM to work but kmeans is still insisting on putting them all into the first cluster 14:21:01 From Rasmus Ørsøe : How big of an increase in accuracy do you recon one would be able to get from hyperparametrization of lightgbm on the classification part? 14:21:13 From Sofus Kjærsgaard Stray : Yes haha that's sensible. I'll figure it out 14:21:31 From Sofus Kjærsgaard Stray : @Rasmus I barely got any different lol 14:21:33 From Rasmus Ørsøe : (I'm barely getting any) 14:21:46 From Sofus Kjærsgaard Stray : from like 90% to 91% 14:21:59 From Aske R. : what kinda numbers do you guys have for accuracy just to see what the standard is 14:22:48 From Rasmus Ørsøe : Alright. Thanks! 14:22:57 From Sofus Kjærsgaard Stray : Aske atm get a log loss score of 0.2 and a "explained variance score" of 0.9 14:23:08 From Rasmus Ørsøe : @Aske I get just under 92% without hyper parametrization 14:23:08 From Sofus Kjærsgaard Stray : (for classification and regression respectively) 14:23:16 From Aske R. : okay i also have around 90% 14:23:23 From Aske R. : for just using box configurations 14:23:35 From Sofus Kjærsgaard Stray : my neural network is garbage on classification though 14:23:43 From kristoffer : Anybody know where can to find some infornation on how to use the MODI part of erda? I am confident using DAG, but I would like to enhance perfomance by parallelizing using MODI. 14:23:48 From Rasmus Ørsøe : I gained about 1% on the classification by optimizing the learning budget 14:23:51 From Aske R. : did you remember to use a quantile transform 14:23:58 From kristoffer : @Sofue Hve you trnasformed data? 14:24:00 From Troels Christian Petersen : @Sofus: If you don’t transform the variables, that is also what I would expect. 14:24:03 From Aske R. : or otherwise normalisze the data 14:24:08 From Sofus Kjærsgaard Stray : Evidently I have not 14:24:15 From Troels Christian Petersen : OK… then there you go! 14:24:32 From Sofus Kjærsgaard Stray : How would I go about doing this? I may have missed something during the lectures it seems 14:24:45 From kristoffer : quantile_transform sklearn 14:25:52 From Sofus Kjærsgaard Stray : Ah, so like we did with PCA 14:26:45 From Emy Alerskans : Does it matter which normalization/standardization you use? E.g. quantile_transform or preprocessing.StandardScaler from sklearn? 14:27:33 From Emy Alerskans : thanks :) 14:27:54 From Andy Anker : How important is the transformation if all the variables are in about the same range?? F.eks. between 0 and 100. Is it more important, if one of the variables is significantly smaller/larger than the rest?? 14:28:03 From Sofus Kjærsgaard Stray : oh wow that changed everything 14:28:26 From kristoffer : Should featrues and labels be transformed in the same manner, or can you use different transformations? 14:28:31 From Aske R. : what's common to use for final activation function for a regression problem with a NN 14:28:56 From Sofus Kjærsgaard Stray : Yes it's great now 14:29:05 From kristoffer : *for regression 14:29:24 From Sofus Kjærsgaard Stray : I got 90% accuracy even for non-transformed values with regression 14:30:29 From Emy Alerskans : should we transform the prediction back then when we submit the results? 14:31:10 From Simone Vejlgaard : @Sofus, what do you mean, when you write accuracy for a regression? R^2? 14:31:24 From kristoffer : Anybody know where can to find some infornation on how to use the MODI part of erda? I am confident using DAG, but I would like to enhance perfomance by parallelizing using MODI. 14:31:30 From Sofus Kjærsgaard Stray : Explained Variance, although R^2 gave very similar results 14:32:07 From Sofus Kjærsgaard Stray : "explained_variance_score" from sklearn.metrics 14:33:11 From Sofus Kjærsgaard Stray : Question: When I do regression and classification, the neural networks have an in-the-box feature importance system. Is this available with clustering algortihms as well or do I have to use something like SHAP values? 14:33:32 From Simone Vejlgaard : Okay, but they are also the same in a linear case, so I guess that makes sense :) 14:33:39 From Sofus Kjærsgaard Stray : yeah 14:38:09 From Haider Fadil Ali Hussein Al-Saadi : Im not sure how to use PCA to find the best variables. Do I use PCA with 15 components and just take the features that best explain those components? 14:39:23 From Ann-Sofie Priergaard Zinck : You could do that, but I would recommend you to have a look at SHAPley values instead 14:40:34 From Troels Christian Petersen : Well, this gives some intuition, but since the PCA is linear, it might miss a lot of things. Also, “best explain” in your sentence is not that well defined, but less of a problem. The former is the main reason for using other methods. One is “permutation importance”, while SHAPley values is another (slower but with more features)... 14:41:59 From Emil Martiny : I just found an example of shap used on lightbgm, copy pasted the code, trained it with all 140 (or how many features there is) and used the shao to give me a ranking. 14:42:30 From Emil Martiny : out of the box lightbgm does btw give about 94,5 % when given all features 14:43:01 From Troels Christian Petersen : And I’ll get to SHAPley values on Wednesday… they are quite neat, and associated with a Nobel Prize (however, “only” in economy). 14:46:56 From Rasmus Salmon : What about just taking the variables with the highest variance? 14:47:01 From Rasmus Salmon : For clustering 14:47:22 From Simon Hilbard : has anyone used the DecisionTreeClassifier.feature_importances_ function to finde the good features ? does it work well ? what input does it want ? 14:56:03 From Sofus Kjærsgaard Stray : I've used LightGBM's version and it works quite well 15:01:34 From Yane García : to make sure for the ranking of features, is it valid just use "n_components" in PCA, from sklearn.decomposition? 15:05:13 From Troels Christian Petersen To zoeansari(privately) : Hi Zoe. There is a student (Marta), who is having a hard time with the NN-part of the coding. She got the trees to work, but she is not strong in Python, and a bit “scared” by sitting alone and coding (I can understand). So would it be possible for you to help her a bit with code? 15:06:47 From zoeansari To Troels Christian Petersen(privately) : Yes, sure 15:07:53 From Troels Christian Petersen To zoeansari(privately) : Great. I will tell her to write you with the specific problem, and if you can help her out, then that is great. And if not, then write me back. My gut feeling is, that she simply needs a bit help with getting it to work, and that she understands the fundamentals. 15:08:29 From Mikkel Langgaard Lauritzen : why are we supposed to train only on the electrons for the regression part again? The test data will have non electrons as well right? 15:08:47 From Sofus Kjærsgaard Stray : You're estimating the energy of the electrons 15:08:55 From Sofus Kjærsgaard Stray : not all the particles 15:09:18 From zoeansari To Troels Christian Petersen(privately) : Hopefully it will be solved soon, and I will let you know how it will go, is it fine for her to talk on slack? 15:10:02 From Troels Christian Petersen To zoeansari(privately) : I will suggest Slack… 15:10:13 From zoeansari To Troels Christian Petersen(privately) : Great thanks 15:13:29 From kristoffer : ok 15:13:48 From Sofus Kjærsgaard Stray : Is there any smart way at all to figure out how many clusters you want or is it a free throw? 15:14:01 From Troels Christian Petersen To zoeansari(privately) : Free… :-) 15:15:05 From Sofus Kjærsgaard Stray : I know we can use various scoring methods to get something out but it seems like shots in the dark to just try different amounts of clusters until we get a nice one 15:17:19 From Andy Anker : You succeeded giving us (or me) that feeling! 15:21:02 From Troels Christian Petersen : Haha… alright - noted. Be glad that the data at least has some structure, and not 99% noise! That would of course serve the purpose poorly. 15:32:31 From Haider Fadil Ali Hussein Al-Saadi : lightgbm has a way as well as they pointed out 15:32:40 From Haider Fadil Ali Hussein Al-Saadi : .feature_importance(importance_type='split', iteration=None) 15:34:55 From Haider Fadil Ali Hussein Al-Saadi : Troels, is K-nearest neighbors ok as an algorithm to use for a solution or is it too simple? 15:35:24 From Rasmus Ørsøe : sklearn also have SelectKBest() that lets you choose different statistical methods of choosing k best parameters 15:36:06 From Rasmus Ørsøe : (independently of the model) 15:38:26 From Yane García : ok so first run the 160 variables 15:39:39 From Yane García : oh right 15:39:49 From Yane García : great 15:39:52 From Yane García : thanks 15:39:52 From Haider Fadil Ali Hussein Al-Saadi : remember to check that you get decent prediction even with all the variables if you do it that way though, since you don't want to use only the features a bad model loves 15:40:20 From Haider Fadil Ali Hussein Al-Saadi : although I got 96% prediction rate while copy pasting old code with lightGBM 15:40:24 From Haider Fadil Ali Hussein Al-Saadi : so it probably doesn't matter 15:41:02 From Troels Christian Petersen : @Haider (and all): k-NN is fine. Yes, it is simple but generally good and fast… and as Brian also alluded to: “It is better to run an algorithm that is not entirely optimal, than not to run any!” 15:41:12 From Rasmus Ørsøe : @haider 96% on classification? 15:41:35 From Haider Fadil Ali Hussein Al-Saadi : yea 15:41:38 From Haider Fadil Ali Hussein Al-Saadi : about there 15:41:48 From Andy Anker : Is that when the energy is excluded`? 15:41:48 From Haider Fadil Ali Hussein Al-Saadi : 94% with 15 best features 15:42:33 From Haider Fadil Ali Hussein Al-Saadi : no its on all variables at first 15:42:46 From Rasmus Ørsøe : I can't seem to exceed 92.5%. Did you do anything else than picking variables? 15:43:04 From Andy Anker : Ahh okay. Then it makes sense 15:43:15 From Haider Fadil Ali Hussein Al-Saadi : i copy pasted old code, so i have parameters from week 1 or 2 15:43:58 From Haider Fadil Ali Hussein Al-Saadi : how big are your splits? 15:44:04 From Haider Fadil Ali Hussein Al-Saadi : mine is 20000 15:44:45 From Rasmus Ørsøe : I keep 14% for validation, rest is training 15:46:38 From Rasmus Ørsøe : Perhaps it's the selection method I'm using that's too simple. I get that the accuracy decreases after 13 variables 15:46:40 From Aske R. : i have an issue where my NN for regression runs, but there is no change in the loss function, i pretty much copied what i had working for the classification problem, changed the final activation function and the loss function, and hoped that it would work but something about the target not being 0 or 1 makes it go haywire, did anyone else have the same problem (using pytorch) 15:48:55 From Andy Anker : What is your loss function? 15:49:08 From Aske R. : MSELoss 15:49:54 From Andy Anker : What about your final act. function? 15:50:01 From Aske R. : ReLU 15:50:10 From Andy Anker : I would guess you should remove that 15:50:19 From Aske R. : and put what instead? 15:50:20 From Andy Anker : The target has negative values also 15:51:02 From Andy Anker : The target has negative values. When you apply a relu, all the values will be converted to positive numbers. You could choose to simply not have an act. function for the last layer 15:51:10 From Aske R. : was looking at the activation functions and couldn't figure out what the simple linear one was called 15:51:40 From Aske R. : didn't know not having one was an option 15:51:42 From Aske R. : will try 15:52:09 From Andy Anker : I had the same problem 15:53:07 From Aske R. : now i get train loss nan... 15:54:35 From Sofus Kjærsgaard Stray : It feels like the neural network performs too well?? Even if my entire network is just 1 hidden layer with 4 nodes I still get 91% accuracy 15:54:58 From Haider Fadil Ali Hussein Al-Saadi : is that with all features? 15:55:07 From Sofus Kjærsgaard Stray : only top 15 15:56:05 From Sofus Kjærsgaard Stray : hmm okay something is definitely wrong here... if my entire network is just 1 sigmoid function it still gives me 90% accuracy 15:58:25 From Haider Fadil Ali Hussein Al-Saadi : That actually might still be the case, depending on what the data looks like 15:58:47 From Haider Fadil Ali Hussein Al-Saadi : for example, if there is 1 cluster where all the electrons are, and the rest is just randomly distributed far from that cluster 15:58:48 From Andy Anker : Aske: I am sry, I was wrong. There is no negative energies, when removing non-electrons. But I still don't use any act. function for the last layer. Did you remove non-electrons? 15:59:46 From Sofus Kjærsgaard Stray : Hm I suppose Haider 15:59:58 From Haider Fadil Ali Hussein Al-Saadi : but i only got 80% with 2 hidden layers with 4 each 16:00:08 From Haider Fadil Ali Hussein Al-Saadi : so it might not be the case 16:00:22 From Haider Fadil Ali Hussein Al-Saadi : ill try with your setup 16:01:09 From Haider Fadil Ali Hussein Al-Saadi : i got 78% with only 1 neuron 16:02:39 From Sofus Kjærsgaard Stray : how many epochs? 16:02:40 From Sofus Kjærsgaard Stray : I'm running with 3 16:03:15 From Haider Fadil Ali Hussein Al-Saadi : 20 16:03:27 From Sofus Kjærsgaard Stray : damn 16:03:49 From Haider Fadil Ali Hussein Al-Saadi : NNclf = MLPClassifier(max_iter=20000,n_iter_no_change=3,solver='adam',activation='logistic',hidden_layer_sizes=(1),random_state=1) 16:03:49 From Sofus Kjærsgaard Stray : model = tf.keras.models.Sequential([ tf.keras.layers.Dense(1,activation='sigmoid') ]) My model is just this 16:03:59 From Sofus Kjærsgaard Stray : tf = tensorflow 16:04:11 From Sofus Kjærsgaard Stray : it also uses adam as its optimizer 16:04:22 From Haider Fadil Ali Hussein Al-Saadi : looks like we are doing the same thing 16:04:41 From Sofus Kjærsgaard Stray : maybe tensorflow is just a good algortihm?? 16:05:02 From Haider Fadil Ali Hussein Al-Saadi : are you 90% on test data? 16:09:49 From Sofus Kjærsgaard Stray : 20/80 16:10:08 From Haider Fadil Ali Hussein Al-Saadi : no, i mean are you getting 90% accuracy on the test data? 16:10:49 From Aske R. : @Andy yes i did remove non-electrons seems to be that the gradient is "exploding" and i should implement gradient clipping 16:11:50 From Andy Anker : Weird. I think something is wrong with your act. functions then. 16:11:58 From Aske R. : i just had a run where the loss function went: larger number; much larger number; inf; nan 16:22:38 From Troels Christian Petersen To Andy Anker(privately) : Du er velkommen til at poste code snippets, men hele løsninger på Slack er det nok bedst at vente med, indtil senere… :-) Lyder det rimeligt? 16:22:45 From Haider Fadil Ali Hussein Al-Saadi : there is no element called "p_truth_E" in all variables? 16:23:13 From Carl-Johannes Johnsen : Not in the variable list no, as you should not use it for training 16:26:54 From Sofus Kjærsgaard Stray : Oh sorry @Heider, yes 90% on the test data 16:27:00 From Sofus Kjærsgaard Stray : or rather 90% on the validation data 16:27:50 From Haider Fadil Ali Hussein Al-Saadi : i honestly have no clue how you do it then. I thought maybe you were overfitting on training data. I got 79% with only 1 neuron, and had to go up to 50 to get 88%