Gaussian Processes |

Mathias Engel

15 March 2018

journal paper

The Problem

We got $N$ data points $\bm{X}_N, \bm{t}_N = \{\bm{x}^{(n)}, t_n \}_{n=1}^N,$ where $\bm{x}$ is a vector and $t$ is a scalar.

We want a model $y(\bm{x})$ .

The probability distribution of a function $y(\bm{x})$ is a Gaussian process if for any finite selection of points $\bm{x}^{(1)}, \bm{x}^{(2)}, ..., \bm{x}^{(N)}$ , the marginal density $P(y(\bm{x}^{(1)}), y(\bm{x}^{(2)}), ..., y(\bm{x}^{(N)}))$ is a Gaussian.

Visualizing the Gaussian Process

import numpy, scipy.stats

def gp_prior(dimension=2, size=100):
    means = numpy.zeros(dimension)
    cov = numpy.zeros((dimension, dimension))
    for x1 in range(dimension):
        for x2 in range(dimension):
            cov[x1, x2] = 10**2*numpy.exp(-.5 * (x2-x1)**2 / 2**2)
    return numpy.random.multivariate_normal(means, cov, size=size)

def sample_rows(X, n_examples=3, replace=False):
    return X[numpy.random.choice(X.shape[0], n_examples, replace=replace)]

Visualizing the Gaussian Procces

wound detection with Bowhead

From parametric models to Gaussian Processes¹

$R_{nh} \equiv \phi_h(\bm{x}^{(n)})$ , given $H$ basis functions $\{\phi_h(\bm{x})\}_{h=1}^H$ .
$\bm{y}_N$ is defined by $y_n \equiv R_{nh} w_h$ , given basis weights $w_h$ .
Assume $P(\bm{w}) = \mathcal{N}(0, \sigma_w^{2} \bm{I})$ is the prior of $\bm{w}$ .

$\bm{y}$ is a linear in $\bm{w}$ and therefore also Gaussian distributed with zero mean, $P(\bm{y}) = \mathcal{N}(0, \bm{Q})$ .

$\bm{Q} = <\bm{y}\bm{y}^\top> = <\bm{R}\bm{w}\bm{w}^\top\bm{R}^\top> = \bm{R}<\bm{w}\bm{w}^\top>\bm{R}^\top = \sigma_w^2 \bm{R}\bm{R}^\top.$

From parametric models to Gaussian Processes

Given measurment noise $\sigma_v^2$ , $\bm{t}$ has a prior distribution $P(\bm{t}) = \mathcal{N}(0, \bm{\bm{C}})$ , with $\bm{C} = \bm{Q} +\sigma_v^2\bm{I} = \sigma_w^2\bm{R} \bm{R}^\top + \sigma_v^2\bm{I}.$
The entries of C is in general $C_{nn^\ast} = \sigma_w^2 \sum_h \phi(\bm{x}^{(n)}) \phi(\bm{x}^{(n^\ast)}) + \sigma_v^2 \delta_{nn^\ast}.$
Let $H \to \infty$ . Solving this integral gives the kernel function for the Gaussian Process.

Example: Measurements of breast cancer cells

wound detection with Bowhead

James Longden (data) & Mathias Engel (algorithm)

Gaussian process regression

Hyperparameters of $\bm{C}$ is optimized on training data, (log likelihood).
New points are predicted as marginal univariate Gaussian distributions.
Missing data and measurement errors are gracefully handled.

Example: Measurements of breast cancer cells

velocity prediction with Bowhead

Thank you

You can learn more at gaussianprocess.org or read

Choosing intial values for the kernel hyperparameters

Equation 40 and Figure 4 from the paper.

Extra definition

We can represent a function as a unknown big vector f.
We assume that f was drawn from a big correlated Gaussian distribution, a Gaussian process.
Observing elements of the vector (optionally corrupted by Gaussian noise) creates a posterior distribution.
The posterior over functions is still a Gaussian process.
Because marginalization in Gaussians is trivial, we can easily ignore all of the data that are neither observed nor queried. Missing data, lazy evaluation.

(loosely from eq. 16-23 in Mackay (1997))↩

Gaussian Processes |

The Problem

Visualizing the Gaussian Process

Visualizing the Gaussian Procces

From parametric models to Gaussian Processes1

From parametric models to Gaussian Processes

Example: Measurements of breast cancer cells

Gaussian process regression

Example: Measurements of breast cancer cells

Thank you

Choosing intial values for the kernel hyperparameters

Extra definition

From parametric models to Gaussian Processes¹