Applied Machine Learning 2025 - Useful ML links

The field of Machine Learning (ML) is developing at a very fast pace and by an expanding number of practitioners. For this reason, text books on ML are typically few and slightly depricated, though they of course describe the general concepts very well. Research papers have partially filled the gap, as these are more versatile and frequently updated. However, they typically only deal with a very small and specific part of the generel ML "phase space".
But whereas "classic" literature (i.e. books and papers) only covers partially, blogs and github repositories seem to fill out the rest, and typically in a much more accessible fashion with great illustrations and example code. For this reason, we have tried to gather some of the more useful links to such blogs and repositories below. It is simply our (slightly random) selection of webpages that we have come across and found illustrative and useful. So use these at will, and also build your own list of reference sites.

Introduction:

A visual introduction to ML (Boosted Decision Trees) and training these, that is well illustrated and very intuitive.

Books:

Applied Machine Learning by David Forsyth. Great book, at times even mildly entertaining, from one of the masters of ML vision.

Elements of Statistical Learning II by Trevor Hastie et al. Second eddition of standard book on fundamentals.

Machine Learning (from PDG) by Kyle Cranmer et al. (2021). A concise (60 page) overview

Deep Learning by Ian Goodfellow et al. (2016). A short and good general introduction to ML can be found in Chapter 5 of Part 1.

Pattern Recognition and Machine Learning by Christopher M. Bishop (2006).

"Interpretable Machine Learning" by Christoph Molnar (2020). A Guide for Making Black Box Models Explainable.

"Convolutional Neural Networks for Visual Recognition" by Andrej Karpathy (2017?). Used for teaching CNN in Stanford's cn231 class.

"Deep Learning with Python" by author of Keras, Francois Chollet, now at Google AI (2017). Especially chapter 4 is a good overview of the fundamentals of Machine Learning.

Book on neural networks and deep learning. Online book only!

Pattern Recognition and Machine Learning (2006) by Christopher M. Bishop is a classic (if older) standard reference.

Papers:

XGBoost paper (2016). Highly readable paper showing the innovations of the XGBoost algorithm.

LightGBM paper (2017). Explaining the great speedup and showing examples of execution times.

More general paper (100 pages) on Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges (2021).

ROCKET: Time series classification using random convolutional kernels

Blogs/Links/Tutorials - supervised learning:

Introduction to tree based learning. Very good introduction to the basics of tree based learning.

Introduction to neural net based learning. Very good introduction to the basics of Neural Net (NN) based learning.

SciKit Learn tutorial. Gives a quick introduction to ML in general and has code examples for SciKit Learn.

XGBoost vs. LightGBM. Discussion of differences, with code examples.

XGBoost, LightGBM, and CatBoost. Discussion of differences and hyperparameters.

Introduction to NGBoost, which is a tree based algorithm, which makes a probabilistic predictions (i.e. uncertainties).

PyTorch Geometric Graph Neural Network Tutorial (2019). Reasonably good guide with code examples.

Survey of the performance of BDTs and NNs on tabulated data, essentially showing that tree based methods simply are very good at this!

Great introduction with code examples to the Verstack package, which combines the powers of LightGBM and Optuna.

Andrej Kaparthy's Recipe for Training Neural Networks (a grand master of this art).

Andrej Kaparthy's description of how to reproduce GPT2 from scratch. Very informative introduction to transformers.

TabPFN for tabular data analysis as published in Nature in 2025.

Blogs/Links/Tutorials - unsupervised learning:

SciKit-Learn manual for and discussion of (unsupervised) clustering.

Two (food & RNA) great examples of PCA applied to simple data.

Overview of t-SNE algorithm with papers and implementations.

Comparison of PCA, t-SNE, UMAP, and LDA in a short visually nice writeup with code examples.

Isolation Forests (Wiki) and their implementation (Towards Data Science) for anomaly detection.

Unsupervised projector using TensorFlow, which includes different datasets and methods.

Blogs/Links/Tutorials - time series analysis:

Blog on using XGBoost for time series analysis, which includes dataset and code.

Do We Really Need Deep Learning Models for Time Series Forecasting?, using BDTs for time series forecasting.

Blogs/Links/Tutorials - Generative Models:

Great introdution to Diffusion Models, with interactive illustrations (but no code).

Blogs/Links/Tutorials - feature importance, re-weighting, optimisation, etc.:

A very good resource on optimisation is given in the Google Tuning Playbook.

Permutation importance described in the SciKit-Learn implementation.

Permutation importance and vs Random Forest Feature Importance (MDI) with code examples of usage.

GitHub repository for SHAPley value calculation, which gives game theory based variable rankings.

Shapley Values explained in chapter 5.9 of C. Molnar's book.

Great non-technical introduction to SHAP values.

Re-weighting data samples using GBReweighter.

Description of the Adam optimiser. Three page overview of the ADAM optimiser (probably currently best Stochastic Gradient Descent algorithm).

Oversampling for imbalanced classification with SMOTE, to improve on situations with little data in one class.

Great overview of Stochastic Gradient Descent methods, with illustrative animations.

Hyperparameter search and model optimization with Weights and Biases' Sweeps, with great visualisation.

Great in depth article on Domain adaptation (i.e. when training and target data is not alike).

Code:

Oneliners for ML mostly to display performance.

Plotting boundaries for ML classification.

Example of using UMAP for clustering in an artistic way!.

On Loss functions:

Cross Entropy (Wiki) (log loss), which is one of the most used loss functions for classification.

Cross Entropy illustrated.

Kullback–Leibler divergence (Wiki) or "relative entropy" used for loss functions.

Other:

How to handle missing values and other "dirty" parts of real data.

"So Long Sucker" game description (Wiki), developed by Lloyd S. Shapley (behind SHAP values) and other.

Illustrative animations of ML inner workings, as a great explainer.

Illustrative webpage showing how a CNN works on the MNIST data (great!) and general images.

Film about AlphaGo (1h28min), showing the competition betweeen man (Lee Sedol) and computer.

Last updated: 27th of March 2025 by Troels Petersen.