Quick Start#

This is a quick tutorial to demonstrate basic usage of G-Learn package.

Import#

Import the package using:

[1]:
import glearn

Prior to starting, you can verify the G-Learn version, the number of available CPU processors, connected GPU devices, and the memory usage of the current Python process using:

[2]:
glearn.info()
glearn version  : 0.21.1
imate version   : 0.18.0
processor       : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
num threads     : 4
gpu device      : not found
num gpu devices : 0
cuda version    : not found
nvidia driver   : not found
process memory  : 124.6 (Mb)

Generate Points#

We generate a set of 50 points randomly distributed in the interval D=[0,1], where 90% of the points are concentrated in the sub-interval [a=0.4,b=0.6] with a uniform distribution, while the remaining points are uniformly spread elsewhere.

To create such a set of points for simplicity, you can use the glearn.sample_data module.

[3]:
from glearn import sample_data
x = sample_data.generate_points(num_points=50, dimension=1,
                                grid=False,a=0.4, b=0.6,
                                contrast=0.9, seed=42)

Generate Noisy Data#

On the set of points x=(x1,,xd)DRd (where d=1 as specified above), we define a stochastic function:

y(x)=i=1dsin(πxi)+ϵ,

where ϵ is a random variable ϵ(x)N(0,σ02) with a noise standard deviation of σ0=0.05.

You can generate the random data described above using the glearn.sample_data module.

[4]:
y_noisy = sample_data.generate_data(x, noise_magnitude=0.1, plot=True)
../_images/tutorials_quick_start_7_0.png

The figure above illustrates the noisy data (represented as dots) alongside the original noise-free function (depicted as a solid curve). It’s worth noting that the majority of data points are concentrated within the sub-interval [0.4,0.6], while outside this interval, data points are sparsely distributed.

Later on, we will demonstrate accurate predictions within the concentrated sub-interval and less precise predictions outside of it.

Stochastic Model for Noisy Data#

We model the random data z by

y(x)=μ(x)+δ(x)+ϵ(x),

where

  • μ(x) is a deterministic mean function.

  • δ(x) is a zero-mean stochastic function and will be determined later.

  • ϵ(x) is a zero-mean stochastic function representing the input noise and characterized by the discrete covariance:

E[ϵ(x),ϵ(x)]=σ02I

where I is the identity matrix, and the hyperparameter σ02 is the variance of the noise. We assume the noise variance is not known.

Design Matrix#

We represent the deterministic mean function μ by the linear model μ(x)=ϕ(x)β as a linear combination of basis functions:

ϕ(x):DRm,

and βRm are the parameters. On discrete points x, the set of basis functions are disretized to the design matrix XRn×m

Xij=ϕj(xi)

Other ways to contrust the design matrix are by trogonometric functions, hyperbolic functions, user-defined custom functions, or a combination of all. Here we only use a fifth order monomial as follows:

ϕ(x)=(1,x,,x4).

Hence, m=5.

Prior for Parameter β#

We also prescribe a normal prior to the unknown parameter β:

p(β|σ2)N(b,σ2B),

where σ2BRm×m is the covariance of β. The hyperparameter σ2 is the variance of the regression and is not known.

[5]:
# Mean of hyperparameter beta.
# The size of b should be m, the number of columns of design matrix X.
import numpy
b = numpy.zeros((6, ))

# Generate a random matrix B for covariance of beta.
# The shape of matrix B should be (m, m)
numpy.random.seed(0)
B = numpy.random.rand(b.size, b.size)

# Making sure the covariance matrix B positive-semidefinite
B = 1e+5 * B.T @ B

Linear Model#

The linear model of mean μ=Xβ can be created by glearn.mean.LinearModel class:

[6]:
# Create mean object using glearn.
mean = glearn.LinearModel(x, polynomial_degree=5, b=b, B=B)

Kernels#

The zero-mean stochastic function δ(x):DR is characterized by its covariance:

E[δ(x),δ(x)]=k(x,x|α).

The function k:D×D×RdR is the correlation kernel and can be created using the glearn.kernel module. Various kernels available in this module, including:

  • Matern()

  • Exponential()

  • SquareExponential()

  • RationalQuadratic()

In this example, we will use the exponential kernel.

[7]:
from glearn import kernels
kernel = kernels.Exponential()
kernel.plot()
../_images/tutorials_quick_start_15_0.png

Scale Hyperparameter#

The hyperparameters α=(α1,,αd)Rd determine the scale of each spatial dimension. In our example, d=1. The scale can be either explicitly given if known or characterized by a prior distribution p(α) using the glearn.priors class. A list of available priors includes:

  • Uniform

  • Cauchy

  • StudentT

  • InverseGamma

  • Normal

  • Erlang

  • BetaPrime

In this context, we are using the Cauchy prior. We will also plot the prior and its first and second derivatives.

[8]:
from glearn import priors
scale = priors.Cauchy()
scale.plot()
../_images/tutorials_quick_start_17_0.png

Covariance#

The covariance of the model is given by

Σ(σ2,σ02,α)=σ2K(α)+σ02I,

where

K(α):RdRn×n,

and I is the identity matrix.

An object of the above covariance model can be created using the glearn.Covariance class.

[9]:
cov = glearn.Covariance(x, kernel=kernel, scale=scale)

Gaussian Process#

The Gaussian process is defined as

zGP(μ,Σ)

and can be created using the glearn.GaussianProcess class with the mean and covariance objects.

[10]:
gp = glearn.GaussianProcess(mean, cov)

Training Hyperparameters#

The hyperparameters (σ,σ0,α) and the parameter β can be trained via the gp.train() function.

The type of profiling for the likelihood function can be set by the profile_hyperparam argument, which can take one of the following values:

  • 'none': no profiling

  • 'var': profiling on variance hyperparameter.

  • 'var_noise': profiling on both variance and noise hypeprarameter.

The optimization method can be set by optimization_method and can be one of:

  • 'chandrupatla': requires jacobian

  • 'brentq': requires jacobian

  • 'Nelder-Mead': requires func

  • 'BFGS': requires func, jacobian

  • 'CG': requires func, jacobian

  • 'Newton-CG': requires func, jacobian, hessian

  • 'dogleg': requires func, jacobian, hessian

  • 'trust-exact': requires func, jacobian, hessian

  • 'trust-ncg': requires func, jacobian, hessian

The internal matrix algebra of this object can be set via the imate_method argument, which can take a value from one of the followings:

  • 'eigenvalue': using eigenvalue method.

  • 'cholesky': using Cholesky method.

  • 'hutchinson': using stochastic Hutchinson method.

  • 'slq': using stochastic Lanczos quadrature method.

Here we use the Cholesky method.

[11]:
profile_hyperparam = 'var'
optimization_method = 'Newton-CG'
imate_options = {'method': 'cholesky'}
hyperparam_guess = None
result = gp.train(y_noisy, profile_hyperparam=profile_hyperparam,
                  log_hyperparam=True, hyperparam_guess=hyperparam_guess,
                  optimization_method=optimization_method, tol=1e-6,
                  max_iter=1000, use_rel_error=True, imate_options=imate_options,
                  verbose=True, plot=False)
          Error
=========================
itr   param  1   param  2
---   --------   --------
001        inf        inf
002   8.87e-02   5.89e-03
003   6.06e-03   2.44e-01
004   3.09e-03   8.21e-02
005   1.35e-04   9.87e-03
006   2.10e-06   1.45e-04
007   0.00e+00   0.00e+00

                                Training Summary
================================================================================
       posterior/param                optimization              imate solver
-----------------------------      -------------------      --------------------
posterior    +2.355328920e+01      method    Newton-CG      method      cholesky
eta          7.9209829878e+01      tol        1.00e-06      tol         1.00e-08
sigma        1.3028753864e-02      max iter       1000      interpolate    False
sigma0       1.1595578487e-01      max bracket try   6      min num samples    0
alpha        1.2929866862e-01      profile param   var      max num samples    0

                                    Process
================================================================================
         time (sec)                    evaluations               processor
-----------------------------      -------------------      --------------------
task         clock    process      task              #      device             #
================================================================================
correlation  2.81e-1  1.03e+0      correlation      23      cpu threads        4
logdet       2.79e-3  9.24e-3      likelihood        8      gpu devices        0
traceinv     8.48e-3  2.98e-2      jacobian          8      gpu multiproc      0
solver       1.15e-1  4.33e-1      hessian           7      gpu thrds/sm       0
overall      5.45e-1  1.85e+0      optimization      7      mem used (b) 1585152

Prediction#

After training the hyperparameters, the gp object is ready to predict the data on new points. First, we create a set of 1000 test points x equally spaced in the interval [0,1].

[12]:
# Generate test points
test_points = sample_data.generate_points(num_points=1000, dimension=1, grid=True)

For the sake of comparison, we also generate the noise-free data on the test points, z(x), using zero noise with σ0=0.

[13]:
# True data (without noise)
z_true = sample_data.generate_data(test_points, noise_magnitude=0.0)

Note that the above step is unnecessary and only used for the purpose of comparison with the prediction since we already know the exact function that generated the noisy data z in the first place.

The posterior predictive distribution of the prediction z(x) is of the form

z(x)N(μ(x),Σ(x,x))

where:

  • μ is the posterior predictive mean, and

  • Σ is the posterior predictive covariance between test points and themselves.

Prediction can be made using the gp.predict() function.

[14]:
z_star_mean, z_star_cov = gp.predict(test_points, cov=True, plot=True,
                                     confidence_level=0.95,
                                     true_data=z_true, verbose=True)
                               Prediction Summary
================================================================================
               process                                    config
-------------------------------------      -------------------------------------
wall time (sec)               1.95e-1      num training points                50
proc time (sec)               6.88e-1      num test points                  1000
memory used (b)              21159936      compute covariance               True

../_images/tutorials_quick_start_29_1.png

By setting the boolean argument cov=False, the predictive covariance is not computed, which enhances computational speed.

  • When cov=True, meaning that both μ and Σ are computed, the prediction process is O((n)3) complex.

  • In contrast, when cov=False to only compute μ, the prediction process is only O((n)2) complex.

Furthermore, when cov=False, once the first prediction on a set of test points {xi}n is made, the future calls to the predict() function are of order O(n), even when applied to a different set of test points. This is because the gp object stores all internal computations that are independent of the test points.

[15]:
z_star_mean = gp.predict(test_points, cov=False, verbose=True)
                               Prediction Summary
================================================================================
               process                                    config
-------------------------------------      -------------------------------------
wall time (sec)               2.18e-2      num training points                50
proc time (sec)               5.16e-2      num test points                  1000
memory used (b)                     0      compute covariance              False