13:19:12 From svend korsgaard : Would you say we are nearing the end of Moore's law? 13:19:52 From Troels Christian Petersen : No - it is clock speed that is slowing down… Moore’s Law is about number of transisters in a chip. So not yet! 13:31:20 From svend korsgaard : What does ETL stand for again? 13:36:15 From svend korsgaard : If you would compare it to an i5 cpu how would it be compared then? 13:36:59 From svend korsgaard : because the i5 is more commonly used with us students, I think it is more relatable 13:37:24 From svend korsgaard : ok thanks 13:40:31 From svend korsgaard : is the CuML syntax the exact same as scikit-learn? 13:41:08 From Michael Haahr : Is there an efficient way of partitioning the data with CUDF in case you have a lot of data (too much to fit on the GPU)? 13:41:11 From zoeansari : Using these gpu based libraries depend on the OS of the computer? 13:41:25 From Sofus Kjærsgaard Stray : cuml and cudf isn't for windows 13:41:28 From Sofus Kjærsgaard Stray : only linux 13:41:32 From Yane García : can we say that is a container? 13:43:02 From Aske R. : yes it is.. 13:43:18 From svend korsgaard : is the CuML syntax the exact same as scikit-learn? 13:44:04 From Emil Martiny : So if we are on windows we can't use this at all? 13:44:16 From Troels Christian Petersen : No, Emil… you can’t... 13:44:50 From svend korsgaard : no windows :( 13:45:08 From svend korsgaard : Please develop it for windows! 13:46:02 From Sofus Kjærsgaard Stray : You can use WSL to use it semi-effectively on windows 13:49:30 From Aske R. : I've heard that even a gradient boosted decision tree can somehow be run at least partly in parallel, is that something that dask can handle out the box or does that take some extra work? 13:50:34 From Andy Sode Anker : ^^I was thinking if DASK can be used with LightGBM? 13:57:41 From Sofus Kjærsgaard Stray : I can't install any of it using pip 13:58:00 From Sofus Kjærsgaard Stray : if I search for it I can see it but it can't be installed 13:59:13 From Michael Haahr : Is there native support for image (jpg, png, etc.) IO with CUDF. As far as I can tell in in the docs, only txt-like/dataframe files are supported for read/write? 14:01:06 From Aske R. : Docker doesn't support windows either 14:01:19 From Sofus Kjærsgaard Stray : I think Mads' suggestion is "use linux scrub" 14:01:32 From Simon Hilbard : why is it that does not woke with windows ? 14:01:34 From Carl-Johannes Johnsen : Docker can run on Windows 14:01:50 From Aske R. : uhh link 14:02:18 From Carl-Johannes Johnsen : https://docs.docker.com/docker-for-windows/ 14:02:42 From Carl-Johannes Johnsen : I don't know about CUDA and graphics cards, as I have an AMD card 14:03:41 From Michael Haahr : If you install on linux - make a new environment or else it will take forever to solve dependencies 14:04:58 From svend korsgaard : When do you plan on releasing the RTX 3060? 14:05:04 From Haider Fadil Ali Hussein Al-Saadi : mine doesn't :D 14:06:35 From Aske R. : does using dask have any merit if you are running only on one gpu? or does it just manage distributing tasks between different cards? 14:10:32 From Aske R. : is there any significant gain In splitting the task between your dedicated gpu and the integrated gpu? 14:12:40 From koerstz : But macOS doesn't support Nvidia GPUs . 14:12:54 From Troels Christian Petersen : Ahh… OK. Reference? 14:13:00 From Carl-Johannes Johnsen : They have AMD 14:13:23 From Carl-Johannes Johnsen : Some of the old ones might have NVidia 14:14:09 From Carl-Johannes Johnsen : The new unreal 5 demo is running on the PS5, which is an AMD :) 14:14:13 From Haider Fadil Ali Hussein Al-Saadi : xD 14:15:11 From Aske R. : insane 14:15:34 From kristoffer : Can you get some external NVIDIA GPU to plug into your laptop through the USB-port, so its a bit more plug and play? 14:17:35 From Haider Fadil Ali Hussein Al-Saadi : cosinus function? 14:36:46 From Sofus Kjærsgaard Stray : Is there a limit to the description file length? Mine is currently 350 words 14:37:58 From Sofus Kjærsgaard Stray : Yours is 15 14:38:00 From Joakim Lajer : So everything should be put in a zipfile and sent on Absalon? 14:38:00 From Sofus Kjærsgaard Stray : 115* 14:39:26 From Simone Vejlgaard : It seems that Absalon has a required minimum file size - at least I recieved some emails saying that the files were not 'accepted', so I had to zip them before I could hand them in 14:39:51 From Carl-Johannes Johnsen : KU mail doesn't like attachments with .py files or zip folders with .py files! So if you are submitting with mail, you should rename the extension 14:40:00 From Emil Martiny : Yes that was also a thing for me 14:40:26 From Carl-Johannes Johnsen : I don't know about absalon! 14:40:28 From Marta : .ipynb works fine through KU mail :) 14:40:38 From Emil Martiny : I think we have handed it in, but it is just a warning that my file is so small. 14:40:42 From Carl-Johannes Johnsen : Back in the day, you could submit .py files. 14:40:59 From Emil Martiny : so it just tell me to double check that it was the right file i uploaded 14:40:59 From Sofus Kjærsgaard Stray : It works fine if you zip all the files 14:41:36 From Marta : yeah it was just last year that .py got banned, I think it was because of the phishing attack 14:41:41 From Emil Martiny : yes exactly it is a warning, i just got a mail and ignored it 14:41:56 From Carl-Johannes Johnsen : @Marta yeah, KU mail doesn't care about .ipynb :) It's KU IT who thinks that by disallowing certain extensions, attachements are safe 14:42:40 From Runi : I tried to send a zip file renamed to a .txt file to Troels, but it still got blocked 14:42:55 From Joakim Lajer : Is it okay to sent 3 Python script from each problem or should it be assembled to one? 14:44:13 From Carl-Johannes Johnsen : @Runi I guess they recognized it as a zip file, looked into it and saw .py extensions? 14:44:30 From Runi : No it's about final project 14:44:30 From Marta : following Joakim's question, I have separate script for every solutions, ie. 5-6 scripts - is that OK or should I combine them? 14:44:36 From Marta : solution* 14:44:46 From Runi : Yeah there is a .py extension within the zip 14:45:06 From Runi : I can try to rename that as well 14:48:01 From kristoffer : Allright thanks! 14:48:41 From Aske R. : KeyError: 'Reggresion' means what in the solution reader? 14:48:52 From Runi To Troels Christian Petersen(privately) : Jeg sendte filen med data igen, tror den gik igennem denne gang 14:49:10 From Sofus Kjærsgaard Stray : energy estiamtino 14:49:12 From Sofus Kjærsgaard Stray : oh 14:49:14 From Troels Christian Petersen To Yane García(privately) : Hi Yane. I don’t see you in any groups… do you have an idea of a project? And/or a group? 14:49:35 From Sofus Kjærsgaard Stray : I think you have to make sure the files are named exactly as Troels have on the website 14:49:47 From Simone Vejlgaard : I think it is a naming error, so maybe you have a typo in the file name 14:50:00 From Sofus Kjærsgaard Stray : So "Regression_YourName_AlgorithmName.txt" 14:50:19 From Carl-Johannes Johnsen : @Aske: you are missing an s in regression? 14:51:40 From Aske R. : sure am 14:51:50 From Carl-Johannes Johnsen : If you are experiencing problems with the solutionreader, try to get the new one from the website. If it is still giving you trouble, it might be an error, and you can just write about it! 14:53:04 From Haider Fadil Ali Hussein Al-Saadi : np.savetxt("Regression_Haideralsaadi_Neuralnetwork_Variablelist.txt", RNN, delimiter=",") 14:54:26 From Elias : you can also use np.arange() to create the index 14:54:34 From Sofus Kjærsgaard Stray : I used the same as Haider, works flawlessly 14:56:49 From Rasmus Ørsøe : df.to_csv(r'yourpath\yourfilename.txt', header = None) works too 14:56:51 From Katja Johansen To Troels Christian Petersen(privately) : Hej Troels, Jeg har lige et spørgsmål ang. the small project. Så for Classification og Regression bruger vi train data for at træne og optimere. Men når vi skal lave clustering, så giver det vel ikke mening at bruge train fordi vi ikke kan validere det. Så spørgsmålet er om det giver mening kun at bruge test data til clustering? Håber mit spørgsmål giver mening? :) Mvh Katja 14:56:57 From Rasmus Ørsøe : If you're working in dataframes 14:57:07 From Rasmus Ørsøe : (includes the index automatically) 15:00:23 From Sofus Kjærsgaard Stray : Make sure that you have the path right 15:00:27 From Rasmus Salmon : I would like to join 15:00:28 From svend korsgaard To Troels Christian Petersen(privately) : breakout! 15:00:34 From Sofus Kjærsgaard Stray : so if you have the reader in a folder, make a subfolder called "solutions" and put your .txt files there 15:01:45 From Runi : I would like to join as well 15:03:13 From Julius Terp : I would like to join as well 15:04:10 From Haider Fadil Ali Hussein Al-Saadi : are you guys saving the models using pickle? 15:04:24 From Haider Fadil Ali Hussein Al-Saadi : I just noticed they want the model as welll :D 15:04:36 From Sofus Kjærsgaard Stray : "want the model"? 15:06:32 From Haider Fadil Ali Hussein Al-Saadi : they want a txt with the the variable list and the model? 15:06:34 From Haider Fadil Ali Hussein Al-Saadi : or wait 15:06:44 From Haider Fadil Ali Hussein Al-Saadi : when they say variable list they mean the features? 15:07:05 From Sofus Kjærsgaard Stray : that's the features you're using yes 15:07:16 From Troels Christian Petersen : Yes, we mean a list of the input features, which we in physics tend to call variables... 15:07:21 From Sofus Kjærsgaard Stray : the general description file can contain the size of your model though 15:13:09 From Jonathan Stubkjær Jegstrup : What is meant by "model"? Is that just the predicted values of the test file, or is it the machine learning model that one has constructed, or something else? 15:13:14 From Aske R. : now I am getting a ValueError: invalid literal for int() with base 10: '' on line 142 15:15:14 From Aske R. : in the solution reader 15:15:25 From Haider Fadil Ali Hussein Al-Saadi : im getting errors in the solution reader too 15:15:31 From Haider Fadil Ali Hussein Al-Saadi : did we have break out rooms for this? 15:17:26 From Troels Christian Petersen : HI Aske and Haider - I’ll try to assign you to the breakout room as well... 15:20:26 From Katja Johansen To Troels Christian Petersen(privately) : Hej Troels, tror du har overset min besked jeg sendte for lidt siden igennem chatten. Så sender den lige igen :) Jeg har lige et spørgsmål ang. the small project. Så for Classification og Regression bruger vi train data for at træne og optimere. Men når vi skal lave clustering, så giver det vel ikke mening at bruge train fordi vi ikke kan validere det. Så spørgsmålet er om det giver mening kun at bruge test data til clustering? Håber mit spørgsmål giver mening? :) Mvh Katja 15:23:19 From Troels Christian Petersen To Katja Johansen(privately) : OK - sorry… det vælger ind på alle kanaler! 15:24:27 From Troels Christian Petersen To Katja Johansen(privately) : Og dit spørgsmål giver god mening. Egentlig kan man bruge både train og test til clustering, men man kan også bare køre det på test. Meningen er typisk, at man IKKE rør ved det endelige data, før man har slået sig fast på en metoder, men her er det ikke så afgørende. 15:25:18 From Katja Johansen To Troels Christian Petersen(privately) : Okay, tak for svar :) 15:48:19 From Runi : There are very few electrons in the final 'test' data compared to the training right? Just want to make sure my prediciton is not bugged. 15:48:56 From Troels Christian Petersen : The test data is a random sampling from the overall data pool, so there is the same fraction of electrons in the test set as in the training set. 15:49:40 From Haider Fadil Ali Hussein Al-Saadi : I discovered something kind of funny with neural networks. Because I had irrelevant features, for the regression, it tended to get stuck calculating the mean and then just repeating the mean across all predictions, something which I figure it does by deeming the high variances variable irrelevant and then becoming static by weighting the low variance features relevant. That's actually something I had not considered before, that removing irrelevant data does more than just improve speed. 15:49:42 From Troels Christian Petersen : At least statistically speaking… but you should not see a very small fraction of electron in. 15:50:46 From Rasmus Salmon : Is there a strict naming convention for the "combined" zip-file if we hand in a zip-file? 15:51:39 From Troels Christian Petersen : @Haider: Yes, that is one more thing, which makes NNs harder to train than trees. And it was one of the items listed (by Trevor Hastie) on the reasons why trees are easy: They don’t care about irrelevant features… but NNs do! In a bad way. 15:51:50 From Troels Christian Petersen : @Rasmus: No - just something with your name to be sure :-) 15:54:09 From Rasmus Salmon : It seems that you can hand in zipped .py files on Absalon. I did not get an error at least. 16:08:39 From Yane García To Troels Christian Petersen(privately) : Hi Troels thanks for writing I found your messge. Well as I haven´t decided which topic to pick, my plan now is to use the data set uploaded on the course website. I wrote to you as well (think last week or 2) well.. to let you know that I don’t have a group and I can join any team. 16:13:46 From Troels Christian Petersen : Hi All. The solution reader has been updated, so please consider the latest (and greatest) version. Cheers, Troels and Carl 16:17:51 From Rasmus Salmon : Troels is it a problem if the solution of ones clustering is an integer in float format? 16:18:05 From Bagne To Troels Christian Petersen(privately) : Hej Troels. Jeg har sendt dig en mail ang aflevering af opgaven. 16:18:13 From Bagne To Troels Christian Petersen(privately) : Kan jeg få dig til at gå ind og læse den? 16:19:10 From Carl-Johannes Johnsen : It is one of the fixes in the SolutionReader, so now it should accept ints in float format 16:19:14 From Troels Christian Petersen : @Rasmus: No, I think that we should be able to accommodate that. Integer values for something that is indeed integer (like a category) is of course always better… 16:19:46 From Troels Christian Petersen To Bagne(privately) : Jeg ser den og svarer straks... 16:20:20 From Rasmus Salmon : @Carl the new one does not accept my floats. 16:21:01 From Carl-Johannes Johnsen : Great! Can you write an example of unallowed float? :) 16:23:58 From Rasmus Salmon : I see that I was running the old one, but I am now getting: FileNotFoundError: [WinError 3] The system cannot find the path specified: 'Solutions' 16:26:11 From Carl-Johannes Johnsen : @Rasmus: Have you put your solutions in a folder called solutions? Because I should not have changed that part! 16:27:22 From Marcus : Problem might be the capital 'S', if you have named your folder 'Solutions' instead of 'solutions' 16:27:42 From Julius Terp : How to normalize the label in the regression case? When I try quantile_transform, it prints: ValueError: Expected 2D array, got 1D array instead: array=[127578.1 12434.154 47263.215 ... 94558.54 188060.39 63371.82 ]. Is there another function to use, and does it matter if I use different preprocessing functions for x and y? 16:29:46 From Rasmus Salmon : @Carl It worked after restarting my kernel. So no problem with the code :) 16:30:44 From Carl-Johannes Johnsen : @Rasmus: good to hear! 16:33:04 From Carl-Johannes Johnsen : @Julius The error is that it is expecting a 2d array. However, the labels are only 1d. However, you shouldn't transform the labels. If you are doing some machine learning on 1d data in the future, I guess you would need to put it in a 2d array for scikit learn. 16:35:37 From Julius Terp : Okay, but don’t we need to normalize the label when doing regression? 16:36:33 From Rasmus Salmon : Add fmt='%s' 16:37:16 From Rasmus Salmon : Like: np.savetxt(file_name, var, fmt='%s') 16:58:46 From svend korsgaard : Thank you for today everyone!