13:22:09	 From Haider Fadil Ali Hussein Al-Saadi : tree
13:22:41	 From Haider Fadil Ali Hussein Al-Saadi : every  forward and back prop
13:27:46	 From Sofus Kjærsgaard Stray : 1/8th of that
13:28:12	 From Haider Fadil Ali Hussein Al-Saadi : too long
13:28:13	 From Haider Fadil Ali Hussein Al-Saadi : :D
13:28:29	 From Rasmus Salmon : No
13:29:19	 From Troels Christian Petersen : So the suggestion was 100 Gbit/s?
13:31:39	 From Simon Hilbard : no
13:31:40	 From Michael : With pytorch, yes
13:31:41	 From Ann-Sofie Priergaard Zinck : No
13:31:43	 From Rasmus Salmon : Not on this problem
13:32:03	 From Aske R. : does a network outperform a direct physical connection between the storage and the gpus?
13:32:12	 From Troels Christian Petersen : No poll - having a problem...
13:32:39	 From Aske R. : for me it was a huge speed up
13:32:46	 From Aske R. : when i switched to gpus
13:33:30	 From Aske R. : if you don't move everything over
13:33:30	 From Michael : If you do it in batches
13:33:47	 From Michael : it is really slow, but necesary if too large data
13:35:17	 From Aske R. : a recurrent would be harder right?
13:35:20	 From Michael : Forward NN, would probably be 'easy' to convert
13:35:23	 From Svend : When talking about the GPU size, is it the VRAM we are talking about
13:35:56	 From Svend : ok thx
13:37:12	 From Aske R. : doing batches with the gpu isn't really a problem for a recurrent network where you might be reading in batches anyways right?
13:37:40	 From Rasmus Salmon : Theoretically would it be possible to have a system where "jumping" around the data is faster then just streaming
13:37:43	 From Rasmus Salmon : ?
13:41:13	 From Michael : More stable, less power and faster
13:41:36	 From Haider Fadil Ali Hussein Al-Saadi : what do you mean by stable?
13:42:04	 From Aske R. : twice every day
13:42:10	 From Aske R. : yup
13:43:19	 From Michael : couple million cpu?
13:43:21	 From Sofus Kjærsgaard Stray : I have absolutely no clue
13:43:23	 From Haider Fadil Ali Hussein Al-Saadi : 3000 playstations
13:43:39	 From Haider Fadil Ali Hussein Al-Saadi : oO
13:43:53	 From Simon Hilbard : what about PlayStations ? 
13:44:51	 From Carl-Johannes Johnsen : The big GPUs have ~5000 cores
13:44:52	 From Sofus Kjærsgaard Stray : What exactly does it mean when a CPU fails?
13:45:03	 From Haider Fadil Ali Hussein Al-Saadi : a crash I believe was the implication
13:45:13	 From Sofus Kjærsgaard Stray : ah
13:48:11	 From Michael : Handle parallel storage
13:49:15	 From Troels Christian Petersen : Yes…
13:49:38	 From Andy Anker : yes
13:49:38	 From Aske R. : yes
13:49:38	 From Sofus Kjærsgaard Stray : yes
13:49:39	 From kristoffer : yes
13:49:40	 From Simone Vejlgaard : yes
13:49:40	 From Svend : me
13:49:41	 From Haider Fadil Ali Hussein Al-Saadi : ys
13:49:41	 From Simon Hilbard : yes
13:49:54	 From Sofus Kjærsgaard Stray : I have but not from hospitals
13:50:00	 From kristoffer : No experimental stuff....
13:50:02	 From Sofus Kjærsgaard Stray : there's lots of "random" data from Keggle 
13:50:07	 From Sofus Kjærsgaard Stray : kaggle*
13:51:07	 From Andy Anker : I don't feel it is much different now....
13:51:55	 From Ann-Sofie Priergaard Zinck : yes
13:52:39	 From Troels Christian Petersen : YEEEEES
13:52:39	 From Simon Hilbard : yes
13:52:40	 From Andy Anker : yes
13:52:41	 From Simone Vejlgaard : yes
13:52:41	 From Emy Alerskans : yes
13:52:50	 From Ann-Sofie Priergaard Zinck : no
13:52:51	 From Emy Alerskans : no
13:52:51	 From kristoffer : no
13:52:53	 From Aske R. : not if you are cern
13:52:57	 From Troels Christian Petersen : yes… but not a lot!
13:53:22	 From Aske R. : sounds expensive with that amount of data
13:54:13	 From Haider Fadil Ali Hussein Al-Saadi : tape?
13:54:24	 From Sofus Kjærsgaard Stray : magnetic tape I think
13:56:23	 From Rasmus Salmon : I don't know if this is out of the scope of this class. How can something like folding@home work without super bad performance?
13:56:25	 From Rasmus Salmon : https://en.wikipedia.org/wiki/Folding@home
14:00:41	 From Rasmus Salmon : Yes
14:01:30	 From Brian Vinter : https://datacenter.csc.fi/wp/about-lumi/
14:04:42	 From Aske R. : there is no reweighing needed in the small project data right?
14:07:42	 From Sofus Kjærsgaard Stray : Small
14:07:57	 From Katja : no
14:08:04	 From Michael : With 2 components it got around 0.9 acc
14:08:05	 From kristoffer : havent tryed
14:08:06	 From Rasmus Salmon : No
14:08:08	 From Aske R. : haven't gotten to it yet
14:08:11	 From Andy Anker : I hope
14:08:29	 From Sofus Kjærsgaard Stray : I've tried running a very simple BGM that's just 3 components
14:08:40	 From Sofus Kjærsgaard Stray : and also 2 and 4 and 5
14:09:00	 From Sofus Kjærsgaard Stray : Ah so try with, say, 10 variables
14:09:24	 From Rasmus Salmon : How to calculate the acc of the clustering?
14:09:27	 From kristoffer : Are you required to submit 9 algorithms (3 for each problem) to get full score?
14:09:27	 From Michael : I used all 25 variables - 25 most significant from classification problem
14:10:42	 From Sofus Kjærsgaard Stray : Oh, I removed the "Truth" variable from clustering. Are you not supposed to do that?
14:10:59	 From Sofus Kjærsgaard Stray : Good
14:11:56	 From Andy Anker : Are we supposed to do regression on the entire dataset, or should we somehow remove non-electrons first?
14:12:15	 From Emy Alerskans : are you supposed to remove 'Truth' and 'p_truth_E' from all three problems (classification, regression, clustering)?
14:12:29	 From Sofus Kjærsgaard Stray : Remove non-electrons
14:13:08	 From Andy Anker : How?
14:13:23	 From Aske R. : just only take the events where Truth == 1
14:13:24	 From Andy Anker : Ahh ok
14:13:37	 From kristoffer : 
@Troels
Are you required to submit 9 algorithms (3 for each problem) to get full score? 
14:13:39	 From Sofus Kjærsgaard Stray : X.loc[train['Truth'] == 1] works if you use pd
14:13:42	 From Aske R. : can be done easily if you have it set up in pandas dataframe
14:14:20	 From Andy Anker : Thanks
14:14:52	 From kristoffer : cool, thanks
14:15:34	 From Emy Alerskans : are you supposed to remove 'Truth' and 'p_truth_E' from all three problems (classification, regression, clustering)?
14:16:03	 From Aske R. : you need them as labels though
14:16:13	 From Rasmus Salmon : Dont we need them as labels?
14:16:27	 From Carl-Johannes Johnsen : Yes, but they should not be in the training data
14:16:53	 From Sofus Kjærsgaard Stray : You can write X = train.drop([the stuff you dont want]) and then Y = train["Truth"]
14:17:08	 From Dina Rapp : So, just to be clear - p_truth_E can’t be used in the clustering then ?
14:17:24	 From Troels Christian Petersen : Yes, that is correct, Dina… they should not be used.
14:18:19	 From Sofus Kjærsgaard Stray : Are we allowed to submit, say, 3 solutions of the same algortihm in clustering but where we use 3 different amounts of variables
14:18:33	 From Sofus Kjærsgaard Stray : so kmeans on 5, 10, and 25 variables
14:19:18	 From Andy Anker : At some point, you said, you will give an example of a description file. When will that be`?
14:19:44	 From Andy Anker : Thanks
14:19:49	 From Sofus Kjærsgaard Stray : I mean I want to show both kmeans vs BGM but I also want to show the results of using 5 vs 10 variables, but that would be 4 "results" which is too much.
14:20:21	 From Sofus Kjærsgaard Stray : and I got BGM to work but kmeans is still insisting on putting them all into the first cluster
14:21:01	 From Rasmus Ørsøe : How big of an increase in accuracy do you recon one would be able to get from hyperparametrization of lightgbm on the classification part? 
14:21:13	 From Sofus Kjærsgaard Stray : Yes haha that's sensible. I'll figure it out
14:21:31	 From Sofus Kjærsgaard Stray : @Rasmus I barely got any different lol
14:21:33	 From Rasmus Ørsøe : (I'm barely getting any)
14:21:46	 From Sofus Kjærsgaard Stray : from like 90% to 91%
14:21:59	 From Aske R. : what kinda numbers do you guys have for accuracy just to see what the standard is
14:22:48	 From Rasmus Ørsøe : Alright. Thanks!
14:22:57	 From Sofus Kjærsgaard Stray : Aske atm get a log loss score of 0.2 and a "explained variance score" of 0.9
14:23:08	 From Rasmus Ørsøe : @Aske I get just under 92% without hyper parametrization
14:23:08	 From Sofus Kjærsgaard Stray : (for classification and regression respectively)
14:23:16	 From Aske R. : okay i also have around 90%
14:23:23	 From Aske R. : for just using box configurations
14:23:35	 From Sofus Kjærsgaard Stray : my neural network is garbage on classification though
14:23:43	 From kristoffer : Anybody know where can to find some infornation on how to use the MODI part of erda? I am confident using DAG, but I would like to enhance perfomance by parallelizing using MODI.
14:23:48	 From Rasmus Ørsøe : I gained about 1% on the classification by optimizing the learning budget
14:23:51	 From Aske R. : did you remember to use a quantile transform
14:23:58	 From kristoffer : @Sofue
Hve you trnasformed data?
14:24:00	 From Troels Christian Petersen : @Sofus: If you don’t transform the variables, that is also what I would expect.
14:24:03	 From Aske R. : or otherwise normalisze the data
14:24:08	 From Sofus Kjærsgaard Stray : Evidently I have not 
14:24:15	 From Troels Christian Petersen : OK… then there you go!
14:24:32	 From Sofus Kjærsgaard Stray : How would I go about doing this? I may have missed something during the lectures it seems
14:24:45	 From kristoffer : quantile_transform sklearn
14:25:52	 From Sofus Kjærsgaard Stray : Ah, so like we did with PCA
14:26:45	 From Emy Alerskans : Does it matter which normalization/standardization you use? E.g. quantile_transform or preprocessing.StandardScaler from sklearn? 
14:27:33	 From Emy Alerskans : thanks :)
14:27:54	 From Andy Anker : How important is the transformation if all the variables are in about the same range?? F.eks. between 0 and 100. Is it more important, if one of the variables is significantly smaller/larger than the rest?? 
14:28:03	 From Sofus Kjærsgaard Stray : oh wow that changed everything
14:28:26	 From kristoffer : Should featrues and labels be transformed in the same manner, or can you use different transformations?
14:28:31	 From Aske R. : what's common to use for final activation function for a regression problem with a NN
14:28:56	 From Sofus Kjærsgaard Stray : Yes it's great now
14:29:05	 From kristoffer : *for regression
14:29:24	 From Sofus Kjærsgaard Stray : I got 90% accuracy even for non-transformed values with regression
14:30:29	 From Emy Alerskans : should we transform the prediction back then when we submit the results?
14:31:10	 From Simone Vejlgaard : @Sofus, what do you mean, when you write accuracy for a regression? R^2?
14:31:24	 From kristoffer : Anybody know where can to find some infornation on how to use the MODI part of erda? I am confident using DAG, but I would like to enhance perfomance by parallelizing using MODI. 
14:31:30	 From Sofus Kjærsgaard Stray : Explained Variance, although R^2 gave very similar results
14:32:07	 From Sofus Kjærsgaard Stray : "explained_variance_score" from sklearn.metrics
14:33:11	 From Sofus Kjærsgaard Stray : Question: When I do regression and classification, the neural networks have an in-the-box feature importance system. Is this available with clustering algortihms as well or do I have to use something like SHAP values?
14:33:32	 From Simone Vejlgaard : Okay, but they are also the same in a linear case, so I guess that makes sense :) 
14:33:39	 From Sofus Kjærsgaard Stray : yeah
14:38:09	 From Haider Fadil Ali Hussein Al-Saadi : Im not sure how to use PCA to find the best variables. Do I use PCA with 15 components and just take the features that best explain those components?
14:39:23	 From Ann-Sofie Priergaard Zinck : You could do that, but I would recommend you to have a look at SHAPley values instead
14:40:34	 From Troels Christian Petersen : Well, this gives some intuition, but since the PCA is linear, it might miss a lot of things. Also, “best explain” in your sentence is not that well defined, but less of a problem. The former is the main reason for using other methods. One is “permutation importance”, while SHAPley values is another (slower but with more features)...
14:41:59	 From Emil Martiny : I just found an example of shap used on lightbgm, copy pasted the code, trained it with all 140 (or how many features there is) and used the shao to give me a ranking. 
14:42:30	 From Emil Martiny : out of the box lightbgm does btw give about 94,5 % when given all features
14:43:01	 From Troels Christian Petersen : And I’ll get to SHAPley values on Wednesday… they are quite neat, and associated with a Nobel Prize (however, “only” in economy).
14:46:56	 From Rasmus Salmon : What about just taking the variables with the highest variance?
14:47:01	 From Rasmus Salmon : For clustering
14:47:22	 From Simon Hilbard : has anyone used the DecisionTreeClassifier.feature_importances_  function to finde the good features ? does it work well ? what input does it want ? 
14:56:03	 From Sofus Kjærsgaard Stray : I've used LightGBM's version and it works quite well
15:01:34	 From Yane García : to make sure for the ranking of features, is it valid just use "n_components" in PCA, from sklearn.decomposition?
15:05:13	 From Troels Christian Petersen  To  zoeansari(privately) : Hi Zoe. There is a student (Marta), who is having a hard time with the NN-part of the coding. She got the trees to work, but she is not strong in Python, and a bit “scared” by sitting alone and coding (I can understand). So would it be possible for you to help her a bit with code?
15:06:47	 From zoeansari  To  Troels Christian Petersen(privately) : Yes, sure
15:07:53	 From Troels Christian Petersen  To  zoeansari(privately) : Great. I will tell her to write you with the specific problem, and if you can help her out, then that is great. And if not, then write me back. My gut feeling is, that she simply needs a bit help with getting it to work, and that she understands the fundamentals.
15:08:29	 From Mikkel Langgaard Lauritzen : why are we supposed to train only on the electrons for the regression part again? The test data will have non electrons as well right?
15:08:47	 From Sofus Kjærsgaard Stray : You're estimating the energy of the electrons
15:08:55	 From Sofus Kjærsgaard Stray : not all the particles
15:09:18	 From zoeansari  To  Troels Christian Petersen(privately) : Hopefully it will be solved soon, and I will let you know how it will go, is it fine for her to talk on slack?
15:10:02	 From Troels Christian Petersen  To  zoeansari(privately) : I will suggest Slack…
15:10:13	 From zoeansari  To  Troels Christian Petersen(privately) : Great thanks
15:13:29	 From kristoffer : ok
15:13:48	 From Sofus Kjærsgaard Stray : Is there any smart way at all to figure out how many clusters you want or is it a free throw?
15:14:01	 From Troels Christian Petersen  To  zoeansari(privately) : Free… :-)
15:15:05	 From Sofus Kjærsgaard Stray : I know we can use various scoring methods to get something out but it seems like shots in the dark to just try different amounts of clusters until we get a nice one
15:17:19	 From Andy Anker : You succeeded giving us (or me) that feeling!
15:21:02	 From Troels Christian Petersen : Haha… alright - noted. Be glad that the data at least has some structure, and not 99% noise! That would of course serve the purpose poorly.
15:32:31	 From Haider Fadil Ali Hussein Al-Saadi : lightgbm has a way as well as they pointed out
15:32:40	 From Haider Fadil Ali Hussein Al-Saadi : .feature_importance(importance_type='split', iteration=None)
15:34:55	 From Haider Fadil Ali Hussein Al-Saadi : Troels, is K-nearest neighbors ok as an algorithm to use for a solution or is it too simple?
15:35:24	 From Rasmus Ørsøe : sklearn also have SelectKBest() that lets you choose different statistical methods of choosing k best parameters 
15:36:06	 From Rasmus Ørsøe : (independently of the model)
15:38:26	 From Yane García : ok so first run the 160 variables
15:39:39	 From Yane García : oh right
15:39:49	 From Yane García : great
15:39:52	 From Yane García : thanks
15:39:52	 From Haider Fadil Ali Hussein Al-Saadi : remember to check that you get decent prediction even with all the variables if you do it that way though, since you don't want to use only the features a bad model loves
15:40:20	 From Haider Fadil Ali Hussein Al-Saadi : although I got 96% prediction rate while copy pasting old code with lightGBM 
15:40:24	 From Haider Fadil Ali Hussein Al-Saadi : so it probably doesn't matter
15:41:02	 From Troels Christian Petersen : @Haider (and all): k-NN is fine. Yes, it is simple but generally good and fast… and as Brian also alluded to: “It is better to run an algorithm that is not entirely optimal, than not to run any!”
15:41:12	 From Rasmus Ørsøe : @haider  96% on classification?
15:41:35	 From Haider Fadil Ali Hussein Al-Saadi : yea
15:41:38	 From Haider Fadil Ali Hussein Al-Saadi : about there
15:41:48	 From Andy Anker : Is that when the energy is excluded`?
15:41:48	 From Haider Fadil Ali Hussein Al-Saadi : 94% with 15 best features
15:42:33	 From Haider Fadil Ali Hussein Al-Saadi : no its on all variables at first
15:42:46	 From Rasmus Ørsøe : I can't seem to exceed 92.5%. Did you do anything else than picking variables?
15:43:04	 From Andy Anker : Ahh okay. Then it makes sense
15:43:15	 From Haider Fadil Ali Hussein Al-Saadi : i copy pasted old code, so i have parameters from week 1 or 2
15:43:58	 From Haider Fadil Ali Hussein Al-Saadi : how big are your splits?
15:44:04	 From Haider Fadil Ali Hussein Al-Saadi : mine is 20000
15:44:45	 From Rasmus Ørsøe : I keep 14% for validation, rest is training
15:46:38	 From Rasmus Ørsøe : Perhaps it's the selection method I'm using that's too simple. I get that the accuracy decreases after 13 variables
15:46:40	 From Aske R. : i have an issue where my NN for regression runs, but there is no change in the loss function, i pretty much copied what i had working for the classification problem, changed the final activation function and the loss function, and hoped that it would work but something about the target not being 0 or 1 makes it go haywire, did anyone else have the same problem (using pytorch)
15:48:55	 From Andy Anker : What is your loss function?
15:49:08	 From Aske R. : MSELoss
15:49:54	 From Andy Anker : What about your final act. function?
15:50:01	 From Aske R. : ReLU
15:50:10	 From Andy Anker : I would guess you should remove that
15:50:19	 From Aske R. : and put what instead?
15:50:20	 From Andy Anker : The target has negative values also
15:51:02	 From Andy Anker : The target has negative values. When you apply a relu, all the values will be converted to positive numbers.You could choose to simply not have an act. function for the last layer
15:51:10	 From Aske R. : was looking at the activation functions and couldn't figure out what the simple linear one was called
15:51:40	 From Aske R. : didn't know not having one was an option
15:51:42	 From Aske R. : will try
15:52:09	 From Andy Anker : I had the same problem 
15:53:07	 From Aske R. : now i get train loss nan...
15:54:35	 From Sofus Kjærsgaard Stray : It feels like the neural network performs too well?? Even if my entire network is just 1 hidden layer with 4 nodes I still get 91% accuracy
15:54:58	 From Haider Fadil Ali Hussein Al-Saadi : is that with all features?
15:55:07	 From Sofus Kjærsgaard Stray : only top 15
15:56:05	 From Sofus Kjærsgaard Stray : hmm okay something is definitely wrong here... if my entire network is just 1 sigmoid function it still gives me 90% accuracy
15:58:25	 From Haider Fadil Ali Hussein Al-Saadi : That actually might still be the case, depending on what the data looks like
15:58:47	 From Haider Fadil Ali Hussein Al-Saadi : for example, if there is 1 cluster where all the electrons are, and the rest is just randomly distributed far from that cluster
15:58:48	 From Andy Anker : Aske: I am sry, I was wrong. There is no negative energies, when removing non-electrons.But I still don't use any act. function for the last layer.Did you remove non-electrons?
15:59:46	 From Sofus Kjærsgaard Stray : Hm I suppose Haider
15:59:58	 From Haider Fadil Ali Hussein Al-Saadi : but i only got 80% with 2 hidden layers with 4 each
16:00:08	 From Haider Fadil Ali Hussein Al-Saadi : so it might not be the case
16:00:22	 From Haider Fadil Ali Hussein Al-Saadi : ill try with your setup
16:01:09	 From Haider Fadil Ali Hussein Al-Saadi : i got 78% with only 1 neuron
16:02:39	 From Sofus Kjærsgaard Stray : how many epochs?
16:02:40	 From Sofus Kjærsgaard Stray : I'm running with 3
16:03:15	 From Haider Fadil Ali Hussein Al-Saadi : 20 
16:03:27	 From Sofus Kjærsgaard Stray : damn
16:03:49	 From Haider Fadil Ali Hussein Al-Saadi : NNclf = MLPClassifier(max_iter=20000,n_iter_no_change=3,solver='adam',activation='logistic',hidden_layer_sizes=(1),random_state=1)
16:03:49	 From Sofus Kjærsgaard Stray : model = tf.keras.models.Sequential([ tf.keras.layers.Dense(1,activation='sigmoid')])My model is just this
16:03:59	 From Sofus Kjærsgaard Stray : tf = tensorflow
16:04:11	 From Sofus Kjærsgaard Stray : it also uses adam as its optimizer
16:04:22	 From Haider Fadil Ali Hussein Al-Saadi : looks like we are doing the same thing
16:04:41	 From Sofus Kjærsgaard Stray : maybe tensorflow is just a good algortihm??
16:05:02	 From Haider Fadil Ali Hussein Al-Saadi : are you 90% on test data?
16:09:49	 From Sofus Kjærsgaard Stray : 20/80
16:10:08	 From Haider Fadil Ali Hussein Al-Saadi : no, i mean are you getting 90% accuracy on the test data?
16:10:49	 From Aske R. : @Andy yes i did remove non-electrons seems to be that the gradient is "exploding" and i should implement gradient clipping
16:11:50	 From Andy Anker : Weird. I think something is wrong with your act. functions then. 
16:11:58	 From Aske R. : i just had a run where the loss function went: larger number; much larger number; inf; nan
16:22:38	 From Troels Christian Petersen  To  Andy Anker(privately) : Du er velkommen til at poste code snippets, men hele løsninger på Slack er det nok bedst at vente med, indtil senere… :-)   Lyder det rimeligt?
16:22:45	 From Haider Fadil Ali Hussein Al-Saadi : there is no element called "p_truth_E" in all variables?
16:23:13	 From Carl-Johannes Johnsen : Not in the variable list no, as you should not use it for training
16:26:54	 From Sofus Kjærsgaard Stray : Oh sorry @Heider, yes 90% on the test data
16:27:00	 From Sofus Kjærsgaard Stray : or rather 90% on the validation data
16:27:50	 From Haider Fadil Ali Hussein Al-Saadi : i honestly have no clue how you do it then. I thought maybe you were overfitting on training data.  I got 79% with only 1 neuron, and had to go up to 50 to get 88%