Objectives

conjugate_mll

conjugate_mll(posterior, data)

Evaluate the marginal log-likelihood of the Gaussian process.

Compute the marginal log-likelihood function of the Gaussian process. The returned function can then be used for gradient based optimisation of the model's parameters or for model comparison. The implementation given here enables exact estimation of the Gaussian process' latent function values.

For a training dataset $\{x_n, y_n\}_{n=1}^N$, set of test inputs $\mathbf{x}^{\star}$ the corresponding latent function evaluations are given by $\mathbf{f}=f(\mathbf{x})$ and $\mathbf{f}^{\star}f(\mathbf{x}^{\star})$, the marginal log-likelihood is given by:

$\begin{align} \log p(\mathbf{y}) & = \int p(\mathbf{y}\mid\mathbf{f}) p(\mathbf{f}, \mathbf{f}^{\star})\mathrm{d}\mathbf{f}^{\star}\\ & = 0.5\left(-\mathbf{y}^{\top}\left(k(\mathbf{x}, \mathbf{x}') + \sigma^2\mathbf{I}_N\right)^{-1}\mathbf{y} \right.\\ & \quad\left. -\log\lvert k(\mathbf{x}, \mathbf{x}') + \sigma^2\mathbf{I}_N\rvert - n\log 2\pi \right). \end{align}$

Example

import gpjax as gpx

xtrain = jnp.linspace(0, 1).reshape(-1, 1) ytrain = jnp.sin(xtrain) D = gpx.Dataset(X=xtrain, y=ytrain)

meanf = gpx.mean_functions.Constant() kernel = gpx.kernels.RBF() likelihood = gpx.likelihoods.Gaussian(num_datapoints=D.n) prior = gpx.gps.Prior(mean_function = meanf, kernel=kernel) posterior = prior * likelihood

gpx.objectives.conjugate_mll(posterior, D)

Our goal is to maximise the marginal log-likelihood. Therefore, when optimising the model's parameters with respect to the parameters, we use the negative marginal log-likelihood. This can be realised through

nmll = lambda p, d: -gpx.objectives.conjugate_mll(p, d)

Parameters:

posterior (ConjugatePosterior) –

The posterior distribution for which we want to compute the marginal log-likelihood.
data (Dataset) –

: The training dataset used to compute the marginal log-likelihood.

Returns

ScalarFloat: The marginal log-likelihood of the Gaussian process.

conjugate_loocv

conjugate_loocv(posterior, data)

Evaluate the leave-one-out log predictive probability of the Gaussian process following section 5.4.2 of Rasmussen et al. 2006 - Gaussian Processes for Machine Learning. This metric calculates the average performance of all models that can be obtained by training on all but one data point, and then predicting the left out data point.

The returned metric can then be used for gradient based optimisation of the model's parameters or for model comparison. The implementation given here enables exact estimation of the Gaussian process' latent function values.

For a given ConjugatePosterior object, the following code snippet shows how the leave-one-out log predicitive probability can be evaluated.

Example

import gpjax as gpx ... xtrain = jnp.linspace(0, 1).reshape(-1, 1) ytrain = jnp.sin(xtrain) D = gpx.Dataset(X=xtrain, y=ytrain) ... meanf = gpx.mean_functions.Constant() kernel = gpx.kernels.RBF() likelihood = gpx.likelihoods.Gaussian(num_datapoints=D.n) prior = gpx.gps.Prior(mean_function = meanf, kernel=kernel) posterior = prior * likelihood ... gpx.objectives.conjugate_loocv(posterior, D)

Our goal is to maximise the leave-one-out log predictive probability. Therefore, when optimising the model's parameters with respect to the parameters, we use the negative leave-one-out log predictive probability. This can be realised through

nloocv = lambda p, d: -gpx.objectives.conjugate_loocv(p, d)

Parameters:

posterior (ConjugatePosterior) –

The posterior distribution for which we want to compute the marginal log-likelihood.
data (Dataset) –

: The training dataset used to compute the marginal log-likelihood.

Returns

ScalarFloat: The marginal log-likelihood of the Gaussian process.

log_posterior_density

log_posterior_density(posterior, data)

The log-posterior density of a non-conjugate Gaussian process. This is sometimes referred to as the marginal log-likelihood.

Evaluate the log-posterior density of a Gaussian process.

Compute the marginal log-likelihood, or log-posterior density of the Gaussian process. The returned function can then be used for gradient based optimisation of the model's parameters or for model comparison. The implementation given here is general and will work for any likelihood support by GPJax.

Unlike the marginal_log_likelihood function of the ConjugatePosterior object, the marginal_log_likelihood function of the NonConjugatePosterior object does not provide an exact marginal log-likelihood function. Instead, the NonConjugatePosterior object represents the posterior distributions as a function of the model's hyperparameters and the latent function. Markov chain Monte Carlo, variational inference, or Laplace approximations can then be used to sample from, or optimise an approximation to, the posterior distribution.

Parameters:

posterior (NonConjugatePosterior) –

The posterior distribution for which we want to compute the marginal log-likelihood.
data (Dataset) –

The training dataset used to compute the marginal log-likelihood.

Returns

ScalarFloat: The log-posterior density of the Gaussian process.

elbo

elbo(variational_family, data)

Compute the evidence lower bound of a variational approximation.

Compute the evidence lower bound under this model. In short, this requires evaluating the expectation of the model's log-likelihood under the variational approximation. To this, we sum the KL divergence from the variational posterior to the prior. When batching occurs, the result is scaled by the batch size relative to the full dataset size.

Parameters:

variational_family (VF) –

The variational approximation for whose parameters we should maximise the ELBO with respect to.
data (Dataset) –

The training data for which we should maximise the ELBO with respect to.

Returns

ScalarFloat: The evidence lower bound of the variational approximation.

variational_expectation

variational_expectation(variational_family, data)

Compute the variational expectation.

Compute the expectation of our model's log-likelihood under our variational distribution. Batching can be done here to speed up computation.

Parameters:

variational_family (VF) –

The variational family that we are using to approximate the posterior.
data (Dataset) –

The batch for which the expectation should be computed for.

Returns

Array: The expectation of the model's log-likelihood under our variational
    distribution.

collapsed_elbo

collapsed_elbo(variational_family, data)

Compute a single step of the collapsed evidence lower bound.

Compute the evidence lower bound under this model. In short, this requires evaluating the expectation of the model's log-likelihood under the variational approximation. To this, we sum the KL divergence from the variational posterior to the prior. When batching occurs, the result is scaled by the batch size relative to the full dataset size.

Parameters:

variational_family (VF) –

The variational approximation for whose parameters we should maximise the ELBO with respect to.
data (Dataset) –

The training data for which we should maximise the ELBO with respect to.

Returns

ScalarFloat: The evidence lower bound of the variational approximation.