Applied Machine Learning 2021 - Useful ML links

The field of Machine Learning (ML) is developing at a very fast pace and by an expanding number of practitioners. For this reason, text books on ML are typically few and slightly depricated, though they of course describe the general concepts very well. Research papers have partially filled the gap, as these are more versatile and frequently updated. However, they typically only deal with a very small and specific part of the generel ML "phase space".
But whereas "classic" literature (i.e. books and papers) only covers partially, blogs and github repositories seem to fill out the rest, and typically in a much more accessible fashion with great illustrations and example code. For this reason, we have tried to gather some of the more useful links to such blogs and repositories below. It is simply our (slightly random) selection of webpages that we have come across and found illustrative and useful. So use these at will, and also build your own list of reference sites.

Books:

Deep Learning by Ian Goodfellow et al. (2016). A short and good general introduction to ML can be found in Chapter 5 of Part 1.

Pattern Recognition and Machine Learning by Christopher M. Bishop (2006).

"Interpretable Machine Learning" by Christoph Molnar (2020). A Guide for Making Black Box Models Explainable.

"Convolutional Neural Networks for Visual Recognition" by Andrej Karpathy (2017?). Used for teaching CNN in Stanford's cn231 class.

"Deep Learning with Python" by author of Keras, Francois Chollet, now at Google AI (2017). Especially chapter 4 is a good overview of the fundamentals of Machine Learning.

Book on neural networks and deep learning. Online book only!

Papers:

XGBoost paper (2016). Highly readable paper showing the innovations of the XGBoost algorithm.

LightGBM paper (2017). Explaining the great speedup and showing examples of execution times.

.

Blogs/Links/Tutorials - supervised learning:

Introduction to tree based learning. Very good introduction to the basics of tree based learning.

Introduction to neural net based learning. Very good introduction to the basics of Neural Net (NN) based learning.

SciKit Learn tutorial. Gives a quick introduction to ML in general and has code examples for SciKit Learn.

Simple introduction code to a simple Neural Network in PyTorch.

XGBoost vs. LightGBM. Discussion of differences, with code examples.

XGBoost, LightGBM, and CatBoost. Discussion of differences and hyperparameters.

Introduction to NGBoost, which is a tree based algorithm, which makes a probabilistic predictions (i.e. uncertainties).

PyTorch Geometric Graph Neural Network Tutorial (2019). Reasonably good guide with code examples.

Blogs/Links/Tutorials - unsupervised learning:

SciKit-Learn manual for and discussion of (unsupervised) clustering.

Overview of t-SNE algorithm with papers and implementations.

Isolation Forests (Wiki) and their implementation (Towards Data Science) for anomaly detection.

Blogs/Links/Tutorials - feature importance, re-weighting, etc.:

Permutation importance described in the SciKit-Learn implementation.

Permutation importance and vs Random Forest Feature Importance (MDI) with code examples of usage.

GitHub repository for SHAPley value calculation, which gives game theory based variable rankings.

Shapley Values explained in chapter 5.9 of C. Molnar's book.

Isolation Forests (Wiki) and their implementation (Towards Data Science) for anomaly detection.

Re-weighting data samples using GBReweighter.

Code:

Oneliners for ML mostly to display performance.

Plotting boundaries for ML classification.

Example of using UMAP for clustering in an artistic way!.

On Loss functions:

Cross Entropy (log loss), which is one of the most used loss functions for classification.

Cross Entropy illustrated.

Kullback–Leibler divergence or "relative entropy" used for loss functions.

Other:

"So Long Sucker" game description, developed by Lloyd S. Shapley (behind SHAP values) and other.

Last updated: 19th of April 2021 by Troels Petersen.