Overview#
What g-learn Does?#
The main purpose of g-learn is to estimate the algebraic quantity
where
and
where
Where g-learn Can Be Applied?#
Despite g-learn’s aim being tailored to specific algebraic computations, it addresses a demanding and challenging computational task with wide a range of applications. Namely, the expression (1) is ubiquitous in a variety of applications [R1], and in fact, it is often the most computationally expensive term in such applications. Some common examples of
- Log-Determinant#
If
, then is the log-determinant of , which frequently appears in statistics and machine learning, particularly in log-likelihood functions [R2].- Trace of Matrix Powers#
If
, , then (1) is . Interesting cases are the negative powers, such as the trace of inverse, , where is implicitly known. These class of functions frequently appears in statistics and machine learning. For instance, and appear in the Jacobian and Hessian of log-likelihood functions, respectively [R3].- Schatten
-norm and Schatten -anti-norm# If
, then is the Schatten -norm (if ), and is the Schatten -anti-norm (if ). Schatten norm has applications in rank-constrained optimization in machine learning.- Eigencount and Numerical Rank#
If
is the indicator (step) function in the interval , then estimates the number of non-zero eigenvalues of in that interval, which is an inexpensive method to estimate the rank of a large matrix. Eigencount is closely related to the Principal Component Analysis (PCA) and low-rank approximations in machine learning.- Estrada Index of Graphs#
If
, then is the Estrada index of , which has applications in computational biology such as in protein folding.- Spectral Density#
If
, where is the Dirac’s delta function, then yields the spectral density of the eigenvalues of . Estimating the spectral density of matrices, which is also known as Density of States (DOS), is a common problem in solid state physics.
Randomized Algorithms For Massive Data#
Calculating (1) and its variants is a computational challenge when
Matrices are very large. In practical applications, matrix size could range from million to billion.
cannot be computed directly, or only known implicitly, such as .
g-learn employs randomized (stochastic) algorithms [R5] to fulfill the demand for computing (1) on massive data. Such classes of algorithms are fast and scalable to large matrices. g-learn implements the following randomized algorithms:
- Hutchinson’s Method#
Hutchinson technique is the earliest randomized method employed to estimate the trace of the inverse of an invertible matrix [R6]. g-learn implements Hutchinson’s method to compute (1), (2), and (3) for
.- Stochastic Lanczos Quadrature Method#
The Stochastic Lanczos Quadrature (SLQ) method [R7] combines two of the greatest algorithms of the century in applied mathematics, namely the Monte-Carlo method and Lanczaos algorithm (also, Golub-Kahn-Lanczos algoritm) [R8] [R9], together with Gauss quadrature to estimate (1) for an analytic function
and symmetric positive-definite matrix . g-learn provides an efficient and scalable implementation of SLQ method.
Along with the randomized methods, g-learn also provides direct (non-stochastic) methods which are only for benchmarking purposes to test the accuracy of the randomized methods on small matrices.
Applications in Optimization#
A unique and novel feature of g-learn is the ability to interpolate the trace of the arbitrary functions of the affine matrix function
for a large number of input hyperparameter
Instead of directly computing the above function for every
Petascale Computing#
The core of g-learn is a high-performance C++/CUDA library capable of performing on parallel CPUs or GPU farm with multiple GPU devices. g-learn can fairly perform at petascale, for instance on a cluster node of twenty GPUs with NVIDIA Hopper architecture, each with 60 TFLOPS.
For a gallery of the performance of g-learn on GPU and CPU on massive matrices, see Performance.
References#
Ubaru, S., Saad, Y. (2018). Applications of Trace Estimation Techniques. In: High-Performance Computing in Science and Engineering. HPCSE 2017. Lecture Notes in Computer Science, vol 11087. Springer, Cham. doi: 10.1007/978-3-319-97136-0_2
Ameli, S., and Shadden. S. C. (2023). A Singular Woodbury and Pseudo-Determinant Matrix Identities and Application to Gaussian Process Regression. Applied Mathematics and Computation 452, 128032. DOI, arXiv: 2207.08038 [math.ST].
Ameli, S. and Shadden, S. C. (2022). Noise Estimation in Gaussian Process Regression. arXiv: 2206.09976 [cs.LG]
Ameli, S., and Shadden. S. C. (2022). Interpolating Log-Determinant and Trace of the Powers of Matrix
Mahoney, M. W. (2011). Randomized algorithms for matrices and data. arXiv: 1104.5557 [cs.DS].
Hutchinson, M. F. (1990). A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Comm. Statist. Simulation Comput. Volume 19, Number 2, pp. 433-450. Taylor & Francis. doi: 10.1080/03610919008812866.
Golub, G. H. and Meurant, G. (2010). Matrices, Moments and Quadrature with Applications. Princeton University Press. isbn: 0691143412. jstor.org/stable/j.ctt7tbvs.
Dongarra, J. and Sullivan, F. (2000). The Top 10 Algorithms. Computing in Science and Eng. 2, 1, pp. 22–23. doi: 10.1109/MCISE.2000.814652.
Higham, N. J. (2016). Nicholas J. Higham on the top 10 algorithms in applied mathematics. The Princeton Companion to Applied Mathematics. Princeton University Press. isbn: 786842300.