Objectives
conjugate_mll
Evaluate the marginal log-likelihood of the Gaussian process.
Compute the marginal log-likelihood function of the Gaussian process. The returned function can then be used for gradient based optimisation of the model's parameters or for model comparison. The implementation given here enables exact estimation of the Gaussian process' latent function values.
For a training dataset \(\{x_n, y_n\}_{n=1}^N\), set of test inputs \(\mathbf{x}^{\star}\) the corresponding latent function evaluations are given by \(\mathbf{f}=f(\mathbf{x})\) and \(\mathbf{f}^{\star}f(\mathbf{x}^{\star})\), the marginal log-likelihood is given by:
Example
import gpjax as gpx
xtrain = jnp.linspace(0, 1).reshape(-1, 1) ytrain = jnp.sin(xtrain) D = gpx.Dataset(X=xtrain, y=ytrain)
meanf = gpx.mean_functions.Constant() kernel = gpx.kernels.RBF() likelihood = gpx.likelihoods.Gaussian(num_datapoints=D.n) prior = gpx.gps.Prior(mean_function = meanf, kernel=kernel) posterior = prior * likelihood
gpx.objectives.conjugate_mll(posterior, D)
Our goal is to maximise the marginal log-likelihood. Therefore, when optimising the model's parameters with respect to the parameters, we use the negative marginal log-likelihood. This can be realised through
nmll = lambda p, d: -gpx.objectives.conjugate_mll(p, d)
Parameters:
-
posterior
(ConjugatePosterior
) βThe posterior distribution for which we want to compute the marginal log-likelihood.
-
data
(Dataset
) β: The training dataset used to compute the marginal log-likelihood.
Returns
ScalarFloat: The marginal log-likelihood of the Gaussian process.
conjugate_loocv
Evaluate the leave-one-out log predictive probability of the Gaussian process following section 5.4.2 of Rasmussen et al. 2006 - Gaussian Processes for Machine Learning. This metric calculates the average performance of all models that can be obtained by training on all but one data point, and then predicting the left out data point.
The returned metric can then be used for gradient based optimisation of the model's parameters or for model comparison. The implementation given here enables exact estimation of the Gaussian process' latent function values.
For a given ConjugatePosterior
object, the following code snippet shows
how the leave-one-out log predicitive probability can be evaluated.
Example
import gpjax as gpx ... xtrain = jnp.linspace(0, 1).reshape(-1, 1) ytrain = jnp.sin(xtrain) D = gpx.Dataset(X=xtrain, y=ytrain) ... meanf = gpx.mean_functions.Constant() kernel = gpx.kernels.RBF() likelihood = gpx.likelihoods.Gaussian(num_datapoints=D.n) prior = gpx.gps.Prior(mean_function = meanf, kernel=kernel) posterior = prior * likelihood ... gpx.objectives.conjugate_loocv(posterior, D)
Our goal is to maximise the leave-one-out log predictive probability. Therefore, when optimising the model's parameters with respect to the parameters, we use the negative leave-one-out log predictive probability. This can be realised through
nloocv = lambda p, d: -gpx.objectives.conjugate_loocv(p, d)
Parameters:
-
posterior
(ConjugatePosterior
) βThe posterior distribution for which we want to compute the marginal log-likelihood.
-
data
(Dataset
) β: The training dataset used to compute the marginal log-likelihood.
Returns
ScalarFloat: The marginal log-likelihood of the Gaussian process.
log_posterior_density
The log-posterior density of a non-conjugate Gaussian process. This is sometimes referred to as the marginal log-likelihood.
Evaluate the log-posterior density of a Gaussian process.
Compute the marginal log-likelihood, or log-posterior density of the Gaussian process. The returned function can then be used for gradient based optimisation of the model's parameters or for model comparison. The implementation given here is general and will work for any likelihood support by GPJax.
Unlike the marginal_log_likelihood function of the ConjugatePosterior
object,
the marginal_log_likelihood function of the NonConjugatePosterior
object does
not provide an exact marginal log-likelihood function. Instead, the
NonConjugatePosterior
object represents the posterior distributions as a
function of the model's hyperparameters and the latent function. Markov chain
Monte Carlo, variational inference, or Laplace approximations can then be used
to sample from, or optimise an approximation to, the posterior distribution.
Parameters:
-
posterior
(NonConjugatePosterior
) βThe posterior distribution for which we want to compute the marginal log-likelihood.
-
data
(Dataset
) βThe training dataset used to compute the marginal log-likelihood.
Returns
ScalarFloat: The log-posterior density of the Gaussian process.
elbo
Compute the evidence lower bound of a variational approximation.
Compute the evidence lower bound under this model. In short, this requires evaluating the expectation of the model's log-likelihood under the variational approximation. To this, we sum the KL divergence from the variational posterior to the prior. When batching occurs, the result is scaled by the batch size relative to the full dataset size.
Parameters:
-
variational_family
(VF
) βThe variational approximation for whose parameters we should maximise the ELBO with respect to.
-
data
(Dataset
) βThe training data for which we should maximise the ELBO with respect to.
Returns
ScalarFloat: The evidence lower bound of the variational approximation.
variational_expectation
Compute the variational expectation.
Compute the expectation of our model's log-likelihood under our variational distribution. Batching can be done here to speed up computation.
Parameters:
-
variational_family
(VF
) βThe variational family that we are using to approximate the posterior.
-
data
(Dataset
) βThe batch for which the expectation should be computed for.
Returns
Array: The expectation of the model's log-likelihood under our variational
distribution.
collapsed_elbo
Compute a single step of the collapsed evidence lower bound.
Compute the evidence lower bound under this model. In short, this requires evaluating the expectation of the model's log-likelihood under the variational approximation. To this, we sum the KL divergence from the variational posterior to the prior. When batching occurs, the result is scaled by the batch size relative to the full dataset size.
Parameters:
-
variational_family
(VF
) βThe variational approximation for whose parameters we should maximise the ELBO with respect to.
-
data
(Dataset
) βThe training data for which we should maximise the ELBO with respect to.
Returns
ScalarFloat: The evidence lower bound of the variational approximation.