13:20:38 From Sofus Kjærsgaard Stray : Is it even sensible to have complex hyperparameters? Is there some sort of MLA that uses those? 13:21:37 From Sofus Kjærsgaard Stray : thanks 13:25:29 From Jonathan : 42 😄 13:26:02 From Andy Anker : To answer if the dataset is balanced 13:26:04 From Sofus Kjærsgaard Stray : Good to see how important false/true positives/negatives are? 13:26:04 From Rasmus Salmon : For balancing 13:26:05 From Simone Vejlgaard : To see if your dataset is balanced 13:26:06 From joachim : We would like a balanced dataset 13:26:11 From Haider Fadil Ali Hussein Al-Saadi : because if you only have train on 1 kind, it trains on wrong information 13:26:18 From Haider Fadil Ali Hussein Al-Saadi : it wont know what the 2nd kind looks like 13:26:29 From Troels Christian Petersen : Yes, I guess that many got the idea… balance! 13:28:35 From Andy Anker : Is it a problem to have balanced dataset, if you just report your "baseline". If you have 99 % of one class, the ML algorithm still does fine, if it can predict 99.9 %, I would say?? Or is there other ways to deal with balanced data? 13:30:28 From Troels Christian Petersen : Typically, if you don’t check, then you might just get 99% accuracy and think that all is well! If you can get 99.9% then that is of course better, but it might be hard to get there, as estimating “signal” (1%) comes with a large inherent risk, and only the most clear cases will make sense. 13:31:04 From Haider Fadil Ali Hussein Al-Saadi : how do you run the slides in jupyter notebook? 13:31:06 From Troels Christian Petersen : However, if you choose another measure (i.e. loss function) of goodness, then it might be more advantages to estimate “signal”. 13:32:08 From Troels Christian Petersen : Finally, you can reweigh your sample to be balanced, either by attaching weights to each event (say 0.01 for all background events), or simply repeat the same (signal) events again and again. However, if the samples is very unbalanced, then this might still be hard to do! 13:32:30 From Christopher Carman : What about parameters that are not discrete would it not depend on grid spacing alot ? For the grid search 13:32:33 From Troels Christian Petersen : @Haider: Question for later… but it is useful. 13:33:24 From Troels Christian Petersen : @Christopher: Yes, it will. Typically, one simply chooses a few options (like 1, 10, 100), but Christian will touch upon this later… 13:34:32 From Andy Anker : Thank you Troels! I actually have this problem in a research problem and just solved it by reporting both baseline and accuracy. I will just think about your solutions and might come back with follow-up questions. 13:36:10 From Jonathan : du you to specify which values of hyper parameters you try out in the grid search method? 13:36:22 From Jonathan : do you have to* 13:37:49 From Troels Christian Petersen : @Andy: Great… and please do. 13:38:55 From Troels Christian Petersen : @Jonathan: Yes, as it wants to loop over values. However, in the RANDOM search, they don’t have to be… 13:39:40 From joachim : does cv=5 imply 5 folds in the data? 13:40:01 From Emy Alerskans : How do you know which distributions to use in the random search for the different hyper-parameters? 13:40:05 From kristoffer : What would be a good way to combine random and grid search, so that you go through all loss function for instance bu t sample other parameters randomly? 13:40:15 From Jonathan : so how do you determine how your specific hyper parameter is distributed? 13:40:16 From Sofus Kjærsgaard Stray : Using hyperhyperparameters 13:42:09 From Haider Fadil Ali Hussein Al-Saadi : infinite nested loops finding hyper^n parameters :D 13:44:03 From Troels Christian Petersen : Yes, CV=5 means 5-fold cross validation... 13:45:43 From Sofus Kjærsgaard Stray : What do the axes mean? 13:45:47 From Troels Christian Petersen : @Emy: You don’t know this ahead of time, and so you have to base this on previous (own?) experience. However, many HyperParameters (HPs) are categorical, so that is easy. And for the continuous ones, one simply has to start with a reasonable and LARGE range… 13:45:53 From Aske R. : would It make sense to run first a random search and then a "finer" Bayesian optimization? 13:47:28 From Troels Christian Petersen : @Kristoffer: You don’t want to combine the grid with random search, though I understand what you mean. If you’re considering say 5 Loss Functions, then when testing say 25 (or 100) random configurations, they will cover these nicely. 13:49:38 From Troels Christian Petersen : @Aske: Yes, but that is somehow what happens. The first two points are random, and for the following “low” iteration values, they are more or less random, but slowly they become less and less random. However, the Bayesian approach ensures a good coverage of untested parts of Phase Space (PS), which is in fact probably better than purely random searching... 13:49:43 From Emil Schou Martiny : we should be vary when it fit in the limit of our range 13:50:09 From Emil Schou Martiny : it finds that the min samples is 100, which we sat as out max, so our scan range is probably not good 13:50:13 From Haider Fadil Ali Hussein Al-Saadi : Wouldn't it be better to test loss functions not as a hyper parameter that you choose between, but as one different hyperparameter vector for each loss functions? it wouldn't really make sense to for example test loss squared 3 times more than cross entropy. 13:50:15 From Troels Christian Petersen : Indeed… parameters hitting the limit should ALWAYS sound your alarms... 13:50:33 From kristoffer : @Troels Thans 13:50:43 From kristoffer : *Thanks makes sense 13:53:22 From Troels Christian Petersen : @Haider: It is actually a bit more complicated, since different loss functions yield different scores, which might (well, almost surely are) not be comparable. So you would need to do HP-optimisation separately for each loss function, and then choose between the solutions afterwards. 13:54:00 From Haider Fadil Ali Hussein Al-Saadi : well yea, that was kinda my point. 13:54:35 From Sofus Kjærsgaard Stray : Could running multiple Bayes SMBO optimizations yield different results? Or would they always (or almost always) converge on the same hyperparameters? 13:54:42 From Troels Christian Petersen : OK - I didn’t get that, and was perhaps also not sure why you put in a factor 3 :-) 13:55:15 From Haider Fadil Ali Hussein Al-Saadi : it was an example of what could happen if you randomly chose between loss functions 13:55:35 From Troels Christian Petersen : @Sofus: Well, if you give it a fixed seed, it should converge to the same point. If you don’t, then it should still, but just like all other fits, it might not. 13:55:51 From Troels Christian Petersen : @Haider: Ah, OK… 13:56:03 From Sofus Kjærsgaard Stray : So would it be smart to run it, say, 10 times with different seeds to make sure that the HPs you have are good? 13:56:19 From Sofus Kjærsgaard Stray : or is it not worth the computation effort 13:56:25 From Haider Fadil Ali Hussein Al-Saadi : Or you could run it for long enough time 13:56:50 From Troels Christian Petersen : Normally, this would not be needed. However, for your final thesis result, you might want to run it say 3 times just to sleep well at night before your defence! 13:56:53 From Troels Christian Petersen : But otherwise not... 13:56:57 From Sofus Kjærsgaard Stray : Yeah, that's another point. Would it be smarter to run two SMBOs for 5 iterations versus one for 10 iterations 13:57:32 From Troels Christian Petersen : I would put my money on one for 10 iterations, but I’m sure that you can find examples of the opposite! 13:57:51 From Runi : Are best and minimum reversed in that plot? 13:57:53 From Sofus Kjærsgaard Stray : Alright, thanks 13:58:06 From Troels Christian Petersen : That plot? 13:58:26 From Haider Fadil Ali Hussein Al-Saadi : I think if you have a giant parameter space, running a simulated annealing algorithm on the hyperhyperparameter deciding "jump size" might make sense? 13:59:05 From Troels Christian Petersen : Yes, that could definitely be an idea… 13:59:55 From Andy Anker : I read a bit about genetic algorithms for HPO. Do you know if this is a good method? 14:00:58 From Sofus Kjærsgaard Stray : What does "wall clock time" and "regret" mean? 14:01:20 From Jonathan : so how do you determine how your specific hyper parameter is distributed? in the random search method 14:02:00 From Troels Christian Petersen : Well, the random search needs a limited range, so you need to define this. And this part is not easy, but you should start with a big range... 14:02:09 From Sofus Kjærsgaard Stray : Makes sense. So it's a "how much improvement per extra computation time" graph 14:02:53 From joachim : What does hyper parameters actually represent? Do they tell us something about the actual problem, or just about our dataset? For example, in the housing prices example, if i double the amount of houses i need to predict, do i need larger hyperparamers (more trees etc)? 14:03:53 From Haider Fadil Ali Hussein Al-Saadi : I think its the parameters of the machine learning model, not the problem. So for example, learning rate of neural network? 14:03:54 From Emil Schou Martiny : I think genetic algorithms are often more usefull when you have more different parameters 14:05:01 From Moust : if you are training a model that are very random if i learns or not (like q learning) where repeating the same hyperparameters do not always give the same result every time. how would you handle this ? 14:05:03 From Sofus Kjærsgaard Stray : Is there, at any point, a way to have intuition for the hyperparameters? Like is there a class of problems where a large amount of leafs are useful or another class where you want a high depth in your tree? 14:09:43 From Sofus Kjærsgaard Stray : Should we use hyperparameter optimization for the small project or is that optional/not wanted? (a bit of a meta question) 14:10:01 From joachim : How do i know if i should use very large hyperparameters or choose a different algorithm? Ie. to fit a wave with a sine curve vs fitting with a very high degree polynomial? Can we estimate the "hardness" in the problem somehow? 14:13:31 From Moust : if you are training a model that are very random if i learns or not (like q learning) where repeating the same hyperparameters do not always give the same result every time. how would you handle this ? would you run the model with same hyperparameters several time and take a average or just keep random sampling with new hyperparameters? 14:14:54 From kristoffer : Nice lecture!!! 14:15:25 From Andy Anker : Thx a lot for also sharing the code! 14:33:29 From Sofus Kjærsgaard Stray : Can I use HPO with LightGBM+ 14:33:33 From Sofus Kjærsgaard Stray : and if so how? 14:36:34 From Christian Michelsen : LightGBM has a Python-module which you can use with the scikit-learn API 14:36:43 From Christian Michelsen : For an example, see: https://www.kaggle.com/mlisovyi/lightgbm-hyperparameter-optimisation-lb-0-761 14:36:49 From Christian Michelsen : Cell 16. 14:44:15 From joachim : I have a question about cross validation: it seems to me that when performing CV i would polute my training set with my test set, and then be very prone to overfitting when compared to the validation/ hold out set? 14:46:44 From Christian Michelsen : Cross validation is only applied to the training set, not the test set :) 14:48:29 From Troels Christian Petersen : Thanks for the question: I don’t see why you would be very prone to overtraining. You do “like always”, train on a training set (say 80%) and then test it on the validation/test set (then 20%), while you of course leave the hold-out set by itself. Then you simply ask yourself the question… would my result have been different, if I had chosen the 20% differently in the sample? And then you simply test this, but trying the other 4 combinations. In the course of doing so, you get 5 (correlated) models, but tested on 5 uncorrelated validation samples. From this, you can choose the training parameters as you see fit (e.g. combining the 5 models), and apply it to the hold-out sample, and not overtraining happens. 14:49:17 From Troels Christian Petersen : OK… that was a long “chat message”, but in general (as pointed out), the hold-out sample is not touched, until your algorithm is done. 14:51:06 From Troels Christian Petersen To Haider Fadil Ali Hussein Al-Saadi(privately) : Hi Haider. You seem to be into ML already. May I ask about the source of your knowledge? Also, somehow I assume that I should write in Danish, but in graduate course the default is English… which do you prefer? Cheers, Troels 14:51:33 From Troels Christian Petersen To Haider Fadil Ali Hussein Al-Saadi(privately) : And finally… do you know Gadir? 14:52:45 From Emy Alerskans : When searching for the optimal hyperparameters, does it matter if I do two different searches for let's say 2 hyperparameters followed by a search for 2 other hyperparameters, as opposed to doing one search for all 4 hyperparameters? 14:52:47 From joachim : ah! My misundestanding was that when performing CV i would keep the model between validation folds. 14:53:50 From Emil Schou Martiny : @Troels can't remember if you said this lasat week, but how well did the NN work on the bjet data last week?. How low did you Cern guys get down? Just for comparison with the around 9% percent people here got down to? 14:54:17 From Christian Michelsen : Emy: Yes, that might give different results if the hyperparameters are correlated (e.g. learning rate and number of estimators) 14:54:43 From Emy Alerskans : Thanks Christian :) 14:55:13 From Emy Alerskans : Follow-up question: Then how do we know which to search separately for? 14:55:16 From Haider Fadil Ali Hussein Al-Saadi To Troels Christian Petersen(privately) : I took the course physics of algorithms and did my final project on neural networks, and I've read independently a small amount. I prefer English in zoom chat, since there might be international students. 14:55:22 From Haider Fadil Ali Hussein Al-Saadi To Troels Christian Petersen(privately) : Also, Gadir is my older brother :) 14:56:30 From Haider Fadil Ali Hussein Al-Saadi To Troels Christian Petersen(privately) : is it a problem that I try to answer peoples questions so much? 14:57:47 From Christian Michelsen : You don’t always know, that’s the problem. In the optimal world you’d do one big round of HPO, but that’d kill ya due to the curse of dimensionality in the real world. 14:58:57 From Emy Alerskans : Ah, okay. So basically I'd need to try it out to see which would be preferred. Thanks :) 15:01:29 From Sofus Kjærsgaard Stray : Can I use np.arange() when inputting my search space for the random search or is it important that I initialize with a random pick 15:01:54 From Christian Michelsen : I have previously had good results with using a high learning rate first, HPO most hyperparameters and then lastly reduce the learning rate and fit the number of estimators using early stopping. See e.g. 3.5) of my thesis for a more thorough discussion: https://github.com/ChristianMichelsen/Thesis_QuarksVsGluons/blob/master/MasterThesis.pdf 15:02:53 From Christian Michelsen : Sofus: in that case it be would be treated as a categorical and you’d never try out 2.5 e.g. 15:04:17 From Troels Christian Petersen To Haider Fadil Ali Hussein Al-Saadi(privately) : It is not at all a problem - it is exactly what I was hoping for. You and Sofus seem to be the most active, and it would be great, if others joined, even with plain/obvious questions. But years of teaching has taught me, that simply actively participating teaches you more! 15:04:35 From Troels Christian Petersen To Haider Fadil Ali Hussein Al-Saadi(privately) : And say HI to Gadir from me… (I was his bachelor supervisor). 15:11:55 From Sofus Kjærsgaard Stray : What's a good way to get the accuracy score with LightGBM? 15:14:17 From Sofus Kjærsgaard Stray : nvm I just used sklearn's 15:30:26 From Runi : Can the HyperparameterOptimization code be used on a continous problem? It fails when I use it on the QSO for redshift. I get "ValueError: Unknown label type: 'continuous'" 15:32:18 From Troels Christian Petersen : What algorithm are you using? And what HP mode are you in? For the grid search, it insists on a set of specific values... 15:35:40 From Runi : Right now I'm just replaced the pulsar data set with the QSO data set, and set y to be the redshift and not "class". 15:36:39 From Sofus Kjærsgaard Stray : I don't really understand all the extra in the bayesian optimization. Is there a reason we can't just call the BayesianOptimziation function directly on our classifier? 15:37:02 From Sofus Kjærsgaard Stray : is the "f =" meant to signify our loss function? 15:38:56 From Haider Fadil Ali Hussein Al-Saadi : which webpage? 15:40:12 From Carl-Johannes Johnsen : @Haider: the course webpage: https://www.nbi.dk/~petersen/Teaching/AppliedMachineLearning2020.html there's a subpage for the small project: https://www.nbi.dk/~petersen/Teaching/ML2020/SmallProject.html 15:40:21 From Runi : The HPO code can only be used for classification, and not regression right? 15:41:59 From Troels Christian Petersen : No, it should be generally applicable, since all it cares about is getting a score (= loss function value) for each possible configuration in a (large) configuration space… 15:43:06 From Carl-Johannes Johnsen : And you need to reconfigure your model for regression, instead of classification. 15:46:11 From Sofus Kjærsgaard Stray : I can't get the h5 files to load, when using your script I just get a 1x162500 array out with nothing else 15:47:05 From Troels Christian Petersen : I’ll have to check the script - I wanted to update it anyway, but didn’t get around to it yet. However, it should simply be a very “normal” hdf5 file… 15:47:24 From Runi : How would you for example reconfigure "Part A: Naïve Approach" to regression? Can't see some toggle in the documentation 15:48:31 From Andy Anker : For me the dataloader seems fine. If you type data['index'] you get the index features. data['eventNumber'] you get the eventNumber features etc. 15:50:21 From Carl-Johannes Johnsen : a (albeit slow) way to extract from hdf5 into a normal numpy array: import h5py import numpy as np with h5py.File("train.h5", "r") as hf: train = hf["train"][:] labels = train["Truth"] labels = np.array(labels.tolist()) names_to_remove = ["Truth", "p_truth_E"] data_fields = [name for name in train.dtype.names if name not in names_to_remove] data = train[data_fields] data = np.array(data.tolist()) 15:52:07 From Sofus Kjærsgaard Stray : Getting an error about calling train["Truth"] 15:53:05 From Carl-Johannes Johnsen : can you do train["index"] ? 15:53:47 From Sofus Kjærsgaard Stray : nope 15:53:59 From Sofus Kjærsgaard Stray : it doesn't accept a string as an indice 15:54:10 From Carl-Johannes Johnsen : Ok, can you post what you have so far on slack? :) 15:54:24 From Sofus Kjærsgaard Stray : I literally copy/pasted your code 15:56:01 From Andy Anker : Christian, I have a questions for your code when you use Bayesian opt. In your example, you want to optimize max_depth, which has to be an integer. Maybe (probably) I am wrong. But as I see it, you let the parameter optimize as it was a float and then you round it to closest integer afterwards. Is that right, or am I doing something wrong? 15:56:06 From Troels Christian Petersen To Carl-Johannes Johnsen(privately) : Hvis du har en opdateret version, så send den gerne, så jeg kan lægge den på websiden… ellers må vi lige se på det. 15:56:59 From Carl-Johannes Johnsen To Troels Christian Petersen(privately) : Semi, jeg havde den notebook jeg sendte tidligere. Det virker fint på erda, hvilket er ret underligt. Men jeg prøver lige at se hvad der er galt! 15:57:20 From Carl-Johannes Johnsen To Troels Christian Petersen(privately) : introduction.ipynb 15:58:27 From Carl-Johannes Johnsen : @Sofus: strange. Will this print anything? import h5py import numpy as np with h5py.File("train.h5", "r") as hf: train = hf["train"][:] print (train.dtype.names) 15:58:57 From zoeansari : @Sofus, how about build a pandas data frame from the train and test? 15:58:59 From zoeansari : from sklearn import preprocessing import pandas as pd import numpy as np import h5py with h5py.File("data/train.h5", "r") as hf : data = hf["train"][:] with h5py.File("data/test.h5", "r") as hf : test_data = hf["test"][:] df=pd.DataFrame(data) df_test=pd.DataFrame(test_data) 16:01:38 From Sofus Kjærsgaard Stray : df_test is a 160651x164 dataset, but df is just a 100x1 which seems odd 16:03:12 From Runi : It seems like I have to set the data type of 'y' for it to recognise it as a continuos value. Adding "y=y.astype('int')" may have fixed it. 16:04:29 From Sofus Kjærsgaard Stray : yeah, reading the test file is no problem but the training file is not playing nice 16:04:38 From Rasmus Salmon : My dataframes seem fine. I am using pandas function for reading hdf files: import pandas as pd data = pd.read_hdf(path_or_buf, key = "train") 16:05:12 From Sofus Kjærsgaard Stray : what is "path_or_buf"? 16:05:42 From Sofus Kjærsgaard Stray : hmm it might help if I just redownload the training file 16:05:57 From Rasmus Salmon : Just the path to the file. Documentation is here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_hdf.html 16:06:57 From Carl-Johannes Johnsen : @Sofus, yeah both mine and Zoes code works on ERDA with the file from the small project subpage 16:06:58 From Sofus Kjærsgaard Stray : okay yeah redownlading it helps 16:15:40 From Aske R. : has anyone found an install for the bayes_opt for windows? 16:16:09 From Carl-Johannes Johnsen : @Runi, I think you need to change the estimator to DecisionTreeRegressor, and look at something like mean squared error instead of accuracy as metric. If you just change the y value to an integer, and still use a classifier, I'll guess that you are training against a wide range of classes, which in turn will give you a bad accuracy. 16:16:15 From Sofus Kjærsgaard Stray : https://anaconda.org/conda-forge/bayesian-optimization 16:16:20 From Sofus Kjærsgaard Stray : this here: https://anaconda.org/conda-forge/bayesian-optimization 16:16:22 From Troels Christian Petersen To Carl-Johannes Johnsen(privately) : OK - det vil være super. Og… så har jeg lidt planer om, at “evaluation.py” skal være decideret brugbar, dvs. vise mangle forskellige mål for performance og også lave plots… blot fordi 55+ vil køre den algoritme mange gange! 16:16:27 From Sofus Kjærsgaard Stray : oh sorry, linked twice 16:16:54 From Sofus Kjærsgaard Stray : https://github.com/fmfn/BayesianOptimization for the documentation and pip install link 16:17:32 From Aske R. : thanks 16:17:33 From Runi : @Carl-Johannes, Yeah I figured that, currently changing the metrics as they are different. 16:18:49 From Carl-Johannes Johnsen To Troels Christian Petersen(privately) : MLSolutionReader.py? Yes, jeg sidder og arbejder på den! Altså at kunne hive folks aflevering ud i dictionaries, og finde metrics. Følger den du gav fra sidste år. 16:19:58 From Rasmus Salmon : Troels can I break the silence of a minute? 16:20:29 From Rasmus Salmon : *for a minute 16:20:33 From Carl-Johannes Johnsen : I think you can just go ahead :) 16:21:55 From Rasmus Salmon : Yes 16:22:18 From Rasmus Salmon : Ok, thanks 16:25:21 From joachim : Have anyone used a keras model in combination with sklearn RandomizedSearchCV? I can make it run for sklearn GridSearchCV, but I get an error message related to cloning my estimator if i try with randomized 16:25:37 From Runi : I got it to work with DecisionTreeRegressor: https://i.imgur.com/Kg8aYNg.png 16:31:48 From Jonathan : how do you guys import the .h5 file ? I get a ValueError: No dataset in HDF5 file. when I try to import using pandas.read_hdf5 16:32:37 From Jonathan : read_hdf* 16:32:48 From Rasmus Salmon : I got the same error when not specifying the key ("train" or "test") 16:33:19 From Jonathan : ill try that ^^ 16:33:56 From Jonathan : worked like a charm thx! 16:33:56 From Carl-Johannes Johnsen : @Joachim, Yes, I got it working with sklearns RandomizedSearchCV and the KerasClassifier 16:34:03 From Sofus Kjærsgaard Stray : How do I evaluate the accuracy of my prediction when the Truth is binary (0 or 1) but my prediction is contiuous? 16:34:19 From Jonathan : +1 16:34:52 From Carl-Johannes Johnsen : @Sofus, then you'll have to split. A basic split would be .5 16:35:46 From Carl-Johannes Johnsen : pred_binary = [1 if pred > .5 else 0 for pred in predictions] 16:37:55 From Carl-Johannes Johnsen To Troels Christian Petersen(privately) : I guess de får af vide om ROC og AUC? Mht. valg af split værdi 16:38:28 From Troels Christian Petersen : Hi Sofus et al. The Binary Entropy / Log-Loss does exactly what you want. The truth (y) is binary, so only one of the two terms contribute, and the prediction (y-hat) is continuous, which in the log(y-hat) or log(1-y-hat) term yields a value, that is added to the loss… hopefully low, if your algorithm works well :-) 16:38:57 From Troels Christian Petersen To Carl-Johannes Johnsen(privately) : Ja, det var vi nævnt, og jeg vil nævne det igen… 16:41:15 From Andy Anker : How does one normally report how the HPO was done? Reporting the method, the parameters and range searched and the result? Or is there any smart plots to show? Christian had a nice figure with 2 parameters, but more than that is difficult to visualize... 16:41:20 From Aske R. : anyone else getting a UnboundLocalError: local variable 'BayesianOptimization' referenced before assignment even though they have it imported using Christians code? 16:51:59 From Troels Christian Petersen To Carl-Johannes Johnsen(privately) : @Andy: That is a good question. Some don’t report even the HPs at all, which is of course not good! Reporting them already raises the level, but telling how they were obtained (just in simple terms and with references to algorithms) is good practice. 16:52:11 From Troels Christian Petersen : @Andy: That is a good question. Some don’t report even the HPs at all, which is of course not good! Reporting them already raises the level, but telling how they were obtained (just in simple terms and with references to algorithms) is good practice. 16:53:12 From Andy Anker : Thanks! Sounds doable 16:53:15 From Sofus Kjærsgaard Stray : How exactly is cross-entropy read? What does it represent in terms of, for example, what "0.4" vs "0.7" means 16:53:54 From Haider Fadil Ali Hussein Al-Saadi : you cant read it directly like that 16:54:07 From Haider Fadil Ali Hussein Al-Saadi : instead it makes more sense to look at the graph of its behaviorer 16:54:49 From Haider Fadil Ali Hussein Al-Saadi : essentially, it punishes very bad results VERY harshly 16:55:33 From Sofus Kjærsgaard Stray : I know the idea behind it, but I meant how you interpret the final cross-entropy score at the end. 16:57:05 From Haider Fadil Ali Hussein Al-Saadi : https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html 16:57:24 From Haider Fadil Ali Hussein Al-Saadi : look at the first graph, see how it goes towards infinity when you guess 0 and the true is 1 16:58:50 From Haider Fadil Ali Hussein Al-Saadi : i think he was more asking what the score means. For example, the error squared is easy to see, you can go easily from that to a "physical" result 16:59:10 From Sofus Kjærsgaard Stray : Yes, I think it helps. I'm just wondering if it's possible to get a clearer intuition in terms of what 0.3 means. It doesn't mean that I'm 30% accurate but how can I get a clear idea 17:01:01 From Sofus Kjærsgaard Stray : Yeah, I think that's the best intution I'm gonna get haha 17:01:24 From Haider Fadil Ali Hussein Al-Saadi : it sounds wrong but im not good enough at math to say anything against it :D 17:02:01 From Jonathan : is the chat recorded?