imate Documentation#

imate, short for Implicit Matrix Trace Estimator, is a modular and high-performance C++/CUDA library distributed as a Python package that provides scalable randomized algorithms for the computationally expensive matrix functions in machine learning.

GitHub

PyPI

Anaconda Cloud

Docker Hub

Install

Tutorials

API reference

Performance

Overview#

To learn more about imate functionality, see:

Overview

Supported Platforms#

Successful installation and tests performed on the following operating systems, architectures, and Python and PyPy versions:

Platform	Arch	Device	Python Version				PyPy Version ¹			Continuous Integration
Platform	Arch	Device	3.9	3.10	3.11	3.12	3.8	3.9	3.10	Continuous Integration
Linux	X86-64	CPU	✔	✔	✔	✔	✔	✔	✔
	X86-64	GPU	✔	✔	✔	✔	✔	✔	✔
	AARCH-64	CPU	✔	✔	✔	✔	✔	✔	✔
	AARCH-64	GPU	✔	✔	✔	✔	✔	✔	✔
macOS	X86-64	CPU	✔	✔	✔	✔	✔	✔	✔
	X86-64	GPU ²	✖	✖	✖	✖	✖	✖	✖
	ARM-64	CPU	✔	✔	✔	✔	✖	✔	✔
	ARM-64	GPU ²	✖	✖	✖	✖	✖	✖	✖
Windows	X86-64	CPU	✔	✔	✔	✔	✖	✖	✖
Windows	X86-64	GPU	✔	✔	✔	✔	✖	✖	✖

Python wheels for imate for all supported platforms and versions in the above are available through PyPI and Anaconda Cloud. If you need imate on other platforms, architectures, and Python or PyPy versions, raise an issue on GitHub and we build its Python Wheel for you.

^{1. Our wheels for PyPy are exclusively available through pip and cannot be installed using conda.}

^{2. MacOS does not natively support NVIDIA GPUs.}

Install#

Install with pip from PyPI:

pip install imate

Install with conda from Anaconda Cloud:

conda install -c s-ameli imate

For complete installation guide, see:

Install

Docker#

The docker image comes with a pre-installed imate, an NVIDIA graphic driver, and a compatible version of CUDA Toolkit libraries.

Pull docker image from Docker Hub:

docker pull sameli/imate

For a complete guide, see:

Docker

GPU#

imate can run on CUDA-capable multi-GPU devices, which can be set up in several ways. Using the docker container is the easiest way to run imate on GPU devices. For a comprehensive guide, see:

The supported GPU micro-architectures and CUDA version are as follows:

Version \ Arch	Fermi	Kepler	Maxwell	Pascal	Volta	Turing	Ampere	Hopper
CUDA 9	✖	✖	✖	✖	✖	✖	✖	✖
CUDA 10	✖	✔	✔	✔	✔	✔	✔	✔
CUDA 11	✖	✖	✖	✔	✔	✔	✔	✔
CUDA 12	✖	✖	✖	✔	✔	✔	✔	✔

Tutorials#

Jupyter Notebook

Launch online interactive notebook with Binder.

API Reference#

Check the list of functions, classes, and modules of imate with their usage, options, and examples.

API Reference

Performance#

imate is scalable to very large matrices. Its core library for basic linear algebraic operations is faster than OpenBLAS, and its pseudo-random generator is a hundred-fold faster than the implementation in the standard C++ library.

Read about the performance of imate in practical applications:

Performance on GPU Farm

Comparison of Randomized Algorithms

_images/compare_methods_practical_matrix_logdet_time.png

Comparison With and Without OpenBLAS

_images/benchmark_openblas_sparse_time.png

Interpolation of Affine Matrix Functions

_images/affine_matrix_function_logdet.png

Features#

Matrices can be dense or sparse (CSR or CSC format), with 32-bit, 64-bit, or 128-bit data types, and stored either by row-ordering (C style) or column-ordering (Fortran style).
Matrices can be linear operators with parameters (see imate.Matrix and imate.AffineMatrixFunction classes).
Randomized algorithms using Hutchinson and stochastic Lanczos quadrature algorithms (see Overview)
Novel method to interpolate matrix functions. See Interpolation of Affine Matrix Functions.
Parallel processing both on shared memory and CUDA Capable multi-GPU devices.

Technical Notes#

The core of imate, which is implemented in C++ and NVIDIA CUDA framework, is a standalone modular library for high-performance low-level algebraic operations on linear operators (including matrices and affine matrix functions). This library provides a unified interface for computations on both CPU and GPU, a unified interface for dense and sparse matrices, a unified container for various data types, and fully automatic memory management and data transfer between CPU and GPU devices on demand. This library can be employed independently for projects other than imate. The Doxygen generated reference of C++/CUDA Classes and Namespaces of imate is available for developers.

The front-end interface of imate is implemented in Cython and Python (see Python API Reference for end-users).

Some notable implementation techniques used to develop imate are:

Polymorphic and curiously recurring template pattern programming (CRTP) technique.
OS-independent customized dynamic loading of CUDA libraries (as opposed to dynamic linking).
Static dispatching enables executing imate with and without CUDA on the user’s machine with the same pre-compiled imate installation.
Completely GIL-free Cython implementation.
Providing manylinux wheels build upon customized docker images with CUDA support available on DockerHub:

How to Contribute#

We welcome contributions via GitHub’s pull request. If you do not feel comfortable modifying the code, we also welcome feature requests and bug reports as GitHub issues.

Publications#

For information on how to cite imate, publications, and software packages that used imate, see:

Publications
- How to Cite
- Publications/Software Using imate

License#

This project uses a BSD 3-clause license, in hopes that it will be accessible to most projects. If you require a different license, please raise an issue and we will consider a dual license.