imate.Matrix#

class imate.Matrix(A)#

Create a linear operator object from an input matrix.

The linear operator is a container for various matrix types with a unified interface, establishes a fully automatic dynamic buffer to allocate, deallocate, and transfer data between CPU and multiple GPU devices on demand, as well as performs basic matrix-vector operations with high performance on both CPU or GPU devices.

An instance of this class can be used as an input matrix to any function in imate that accepts slq method.

Parameters:
Anumpy.ndarray or scipy.sparse, (n, n)

The input matrix A can be dense or sparse (both CSR and CSC formats are supported). Also, the matrix data type can be either 32-bit, 64-bit, or 128-bit. The input matrix can be stored either in row-ordering (C style) or column-ordering (Fortran style).

Note

128-bit data type is not supported on GPU.

Notes

Where to use this class:

The instances of this class can be used just as a normal matrix in any function in imate that accepts slq method. For instance, when calling imate.logdet() function using method=slq argument, the input matrix A can be an instance of this class.

Why using this class:

This class is a replacement to numpy’s dense matrices and scipy’s sparse matrices. The purpose of creating a linear operator is three-fold:

  1. A container to handle a variety of matrices and data types with a unified interface.

  2. A dynamic buffer for matrix data on either CPU or multi-GPU devices with fully automatic allocation, deallocation, and data transfer between CPU and GPU devices.

  3. Performing basic linear algebraic operations on the matrix data with high performance on CPU or GPU devices.

Further details of each of the points are described below.

1. Unified interface for matrix types:

The linear operator encompasses the following matrix types all in one interface:

  • Dense or sparse matrices with CSR or CSC sparse formats.

  • 32-bit, 64-bit, and 128-bit data types.

  • Row-ordering (C style) and column-ordering (Fortran style) storage.

2. Unified interface for memory buffer between CPU and GPU:

This class creates a dynamic buffer to automatically allocate, deallocate, and transfer matrix data based on demand between CPU and multiple GPU devices. These operations are performed in parallel for each GPU device. Also, deallocation is performed by a smart garbage collector, so the user should not be concerned about cleaning the data on the device and the issue of memory leak. An instance of this class can be efficiently reused between multiple function calls. This class uses a lazy evaluation strategy, meaning that data allocation and transfer are not performed until a caller function requests matrix data from this class.

3. Unified interface for basic algebraic operations:

This class handles the following internal operations:

  1. Matrix vector product \(\boldsymbol{y} = \mathbf{A} \boldsymbol{x}\).

  2. Transposed matrix vector product \(\boldsymbol{y} = \mathbf{A}^{\intercal} \boldsymbol{x}\).

  3. Additive matrix vector product \(\boldsymbol{y} = \boldsymbol{y} + \mathbf{A} \boldsymbol{x}\).

  4. Additive transposed matrix vector product \(\boldsymbol{y} = \boldsymbol{y} + \mathbf{A}^{\intercal} \boldsymbol{x}\).

All the above operations are handled internally when an instance of this class is called by other functions.

Each of the above operations has various internal implementations depending on whether the matrix format is dense, sparse CSR, or sparse CSC, whether the memory storage is C style or Fortran style, whether the data type of 32-bit, 64-bit, or 128-bit, and whether the operation is performed on CPU or multi-GPU devices. This class unifies all these implementations in one interface.

Examples

Create Matrix object:

Create a very large sparse matrix with 64-bit data type:

>>> # Create a random sparse matrix with the size of ten million
>>> from imate import toeplitz
>>> n = 10000000
>>> A = toeplitz(2, 1, n, gram=True, format='csr', dtype='float64')
>>> print(A.dtype)
dtype('float64')

>>> print(type(A))
scipy.sparse.csr.csr_matrix

Create a linear operator from the matrix A:

>>> # Import matrix operator
>>> from imate import Matrix

>>> # Create a matrix operator object from matrix A
>>> Aop = Matrix(A)

Operation on CPU:

Pass the above linear operator as input to any of the matrix functions in imate that accepts slq method, such as imate.logdet() as follows:

>>> # Compute log-determinant of Aop
>>> from imate import logdet
>>> logdet(Aop, method='slq')
13861581.003237441

Note that the above function could also be called directly on the scipy object A rather than Aop:

>>> # Compute log-determinant of Aop
>>> logdet(A, method='slq')
13861581.003237441

However, with the above approach, the logdet() function still creates an object of imate.Matrix internally. The advantage of using Aop instead of A is more clear when using GPU.

Operation on GPU:

Reuse the same object Aop as created in the previous example. However, this time by calling imate.logdet() function with gpu=True argument, the data is automatically transferred from CPU to each of the multi-GPU devices (then, it will be available on both CPU and GPU devices).

>>> # Compute log-determinant of Aop on GPU
>>> logdet(Aop, method='slq', gpu=True)
13861581.003237441

Why using imate.Matrix object:

An advantage of using Aop (instead of the matrix A directly) is that by calling the above logdet function (or another function) again on this matrix, the data of this matrix does not have to be re-allocated on the GPU device again. The next example highlights this point. Call a second function, for instance, traceinv, on the same object Aop as follows.

>>> # Compute trace of inverse of Aop on GPU
>>> from imate import traceinv
>>> traceinv(Aop, method='slq', gpu=True)
13861581.003237441

In the above, no data is needed to be transferred from CPU host to GPU device again. However, if A was used instead of Aop, the data would have been transferred from CPU to GPU again for the second time. The Aop object holds the data on GPU for later use as long as this object does not go out of the scope of the python environment. Once the variable Aop goes out of scope, the matrix data on all the GPU devices will be cleaned automatically.

Attributes:
Anumpy.ndarray or scipy.sparse

Input matrix A from python object.

cpu_Aopobject

Matrix object on CPU.

gpu_Aopobject

Matrix object on GPU.

gpubool, default=False

If True, the matrix object is created for GPU devices.

num_gpu_devicesint, default=0

Number of GPU devices to be used. If 0, it uses maximum number of GPU devices that are available.

initialized_on_cpubool, default=False

Indicates whether the matrix data is allocated in CPU.

initialized_on_gpubool, default=False

Indicates whether the matrix data is allocated in GPU.

data_type_namestr, default=None

The type of matrix data, and can be float32, float64, or float128.

num_parametersint, default=0

Number of parameters of the linear operator. For Matrix class, this parameter is always 0.

Methods

initialize(gpu, num_gpu_devices)

Initializes the object.

get_num_rows()

Returns the number of rows of the matrix.

get_num_columns()

Returns the number of columns of the matrix.

is_sparse()

Determines whether the matrix is dense or sparse.

get_nnz()

Returns the number of non-zero elements of the matrix.

get_density()

Returns the density of non-zero elements of the matrix.

get_data_type_name()

Returns the data type name, which can be float32, float64, or float128.

get_num_parameters()

Returns the number of parameters of the linear operator.

get_linear_operator([gpu, num_gpu_devices])

Sets the linear operator object on CPU or GPU and returns its object.