Standard Gaussian process models map a D-dimensional input to a single scalar
output. In many settings, however, we wish to model several correlated output
quantities simultaneously. A multi-output Gaussian process captures these
correlations so that observations of one output can inform predictions of another.
This notebook introduces the Intrinsic Coregionalization Model (ICM) implemented
in GPJax. We construct a synthetic dataset with two correlated outputs, fit an ICM
model by optimising the marginal log-likelihood, and inspect the learned
coregionalization matrix to see what the model has discovered about the output
structure.
The ICM assumes that all outputs share a single latent Gaussian process, weighted
differently for each output through a positive semi-definite coregionalization
matrix B∈RP×P. Given an input-space kernel
k(x,x′), the multi-output covariance between output p at
input x and output q at input x′ is
cov(fp(x),fq(x′))=Bpqk(x,x′).
Stacking all N observations across P outputs into a single vector of length
NP, the joint covariance matrix takes the Kronecker form
K=B⊗Kxx,
where Kxx is the N×N Gram matrix of the
base kernel.
The coregionalization matrix is parameterised as
B=WW⊤+diag(κ),
where W∈RP×R is a low-rank factor of rank R
and κ∈R>0P is a positive diagonal. The rank
parameter controls how many latent sources of correlation the model can express.
When R=1 and P=2, the model captures a single shared direction of
variation between the two outputs.
Synthetic dataset
We generate two correlated functions on the interval [0,1]. The first output is
f1(x)=sin(2πx) and the second is a mixture
f2(x)=0.5sin(2πx)+0.5cos(2πx), so the two outputs share a
sinusoidal component. Both are corrupted by Gaussian noise with different standard
deviations (σ1=0.1, σ2=0.2) to illustrate the per-output
noise capability of the multi-output likelihood.
We construct the ICM model in three steps. First, we build a
CoregionalizationMatrix with P=2 outputs and rank R=1. Second, we wrap
an RBF base kernel together with the coregionalization matrix inside an
ICMKernel. Third, we pair a zero-mean GP prior with a MultiOutputGaussian
likelihood, which allows a separate noise variance for each output.
We optimise the kernel hyperparameters, the coregionalization matrix entries, and
the per-output noise standard deviations by maximising the conjugate marginal
log-likelihood using L-BFGS via fit_scipy.
Optimization terminated successfully.
Current function value: -14.234924
Iterations: 21
Function evaluations: 24
Gradient evaluations: 24
Optimised negative MLL: -14.235
Prediction
The multi-output posterior returns predictions as a single Gaussian distribution
over a flattened vector of length MP, where M is the number of test points and
P is the number of outputs. The ordering is output-major: all M values for
output 1 appear first, followed by all M values for output 2, and so on. We
reshape the mean and extract per-output marginal variances from the diagonal of
the joint covariance.
The coregionalization matrix B encodes the learned correlations between
outputs. Its off-diagonal entries indicate how strongly the outputs covary: a large
positive entry between outputs p and q means they tend to increase together,
while a value near zero suggests they are largely independent given the shared
input kernel.
We visualise B as a heatmap and print its entries.
Because f2 is defined as a mixture that includes a scaled copy of f1, we
expect the model to recover a positive correlation between the two outputs. The
diagonal entries reflect each output's marginal variance contribution from the
shared latent process.
From ICM to LCM
The Intrinsic Coregionalization Model is powerful but limited: it assumes that
all inter-output correlations are explained by a single shared latent Gaussian
process. When the outputs are driven by multiple independent sources of variation
— for example, a slow trend and a fast oscillation — a single latent kernel cannot
capture both length-scales simultaneously. The ICM must compromise, and the
resulting fit degrades.
The Linear Model of Coregionalization (LCM) removes this limitation by
combining Q independent latent GPs, each equipped with its own input-space
kernel and its own output-space coregionalization matrix. The additional
components give the model the flexibility to assign distinct spectral
characteristics to different sources of inter-output coupling.
The Linear Model of Coregionalization
Given Q latent kernels {kq}q=1Q and Q coregionalization matrices
{B(q)}q=1Q, each of size P×P, the LCM defines the
multi-output covariance between output p at input x and output r at
input x′ as
cov(fp(x),fr(x′))=q=1∑QBpr(q)kq(x,x′).
Stacking all N observations across P outputs into a single vector of length
NP, the joint covariance matrix is the sum of Kronecker productsK=q=1∑QB(q)⊗Kxx(q),
where Kxx(q) is the N×N Gram matrix
of the q-th latent kernel.
Relationship to ICM
Setting Q=1 recovers the ICM exactly: there is one kernel, one
coregionalization matrix, and the covariance has pure Kronecker structure. GPJax
exploits this: when an LCMKernel has a single component, the compute engine
returns a Kronecker operator, preserving the efficient O(N3+P3)
decomposition. For Q>1 the sum of Kronecker products no longer admits a
closed-form Kronecker inverse, so GPJax materialises the full NP×NP dense
matrix and solves via a standard Cholesky factorisation in
O((NP)3).
Per-component coregionalization
Each coregionalization matrix is parameterised as before:
B(q)=W(q)W(q)⊤+diag(κ(q)),
where W(q)∈RP×Rq is a low-rank factor and
κ(q)∈R>0P a positive diagonal. The rank
Rq of each component can be chosen independently. A component with rank 1
captures one direction of inter-output correlation at the length-scale determined
by kq; increasing the rank allows richer coupling patterns at that scale.
A three-output synthetic dataset
To demonstrate the advantage of multiple latent components, we construct a dataset
with three outputs driven by two distinct latent functions:
g1(x)=sin(2πx) — a smooth, low-frequency oscillation,
g2(x)=cos(4πx) — a faster oscillation at double the frequency.
The three observed outputs are mixtures of these latent functions:
\begin{align}
f_1(x) &= g_1(x), \
f_2(x) &= 0.5\,g_1(x) + 0.5\,g_2(x), \
f_3(x) &= g_2(x).
\end{align}
Outputs 1 and 3 are each dominated by a single latent source, while output 2 is a
balanced mixture of both. A single-component ICM kernel would struggle here because
it cannot separate the two frequency scales. An LCM with Q=2 components —
each learning a different length-scale — should recover the latent structure.
We build an LCM with Q=2 components. The first component uses an RBF kernel,
which is well-suited to capture the smooth, low-frequency latent function g1.
The second component uses a Matérn-3/2 kernel, whose shorter default length-scale
can adapt to the faster oscillation in g2. Each component has its own
CoregionalizationMatrix with P=3 outputs and rank R=1, so each component
captures one direction of inter-output correlation at its characteristic
length-scale.
As with the ICM example, we maximise the conjugate marginal log-likelihood using
fit_scipy. The optimiser now has more parameters to tune: two sets of kernel
hyperparameters and two coregionalization matrices.
Optimization terminated successfully.
Current function value: -20.719374
Iterations: 170
Function evaluations: 202
Gradient evaluations: 202
Optimised negative MLL: -20.719
Prediction
The multi-output posterior returns predictions as a single Gaussian distribution
over a flattened vector of length MP, where M is the number of test points.
The ordering is output-major: all M values for output 1 appear first, then
output 2, then output 3. We reshape accordingly.
Unlike the ICM, which produces a single B matrix, the LCM yields one
coregionalization matrix per component. Each B(q) tells us how the
q-th latent process couples the outputs. By inspecting these matrices we can
recover which outputs share each latent source.
We expect the component paired with the RBF kernel (smooth, low-frequency) to show
strong coupling between outputs 1 and 2, since both contain g1. The Matérn-3/2
component (higher frequency) should couple outputs 2 and 3, which share g2.
The two learned coregionalization matrices reveal the latent structure of the data.
Each component has specialised: one captures the low-frequency correlations driven
by g1, and the other captures the higher-frequency correlations driven by g2.
Output 2, which depends on both latent functions, appears with non-negligible
weight in both matrices — exactly as expected.