matmul#

detkit.matmul(A, B, C, shape=None, trans_a=False, trans_b=False, alpha=1, beta=0, overwrite=False)#

Matrix-matrix multiplication for sub-matrices.

Parameters:

Anumpy.ndarray

A rectangular matrix with either float64 or float32 data type stored with column-major ordering.

Anumpy.ndarray

A rectangular matrix with either float64 or float32 data type stored with column-major ordering.

Bnumpy.ndarray

A rectangular matrix with either float64 or float32 data type stored with row-major or column-major ordering.

shapetuple, default=None

A tuple of size three, determining the shape of an upper-left sub-matrices of A, B, and C to be referenced. Namely, if shape is given as the tuple (m, n, k), then

m is the number of rows of \(\mathrm{op}(\mathbf{A})\).
n is the number of columns of \(\mathrm{op}(\mathbf{A})\), which is also the number of rows of \(\mathrm{op}(\mathbf{B})\).
k is the number of columns of \(\mathrm{op}(\mathbf{C})\).

As such the following sub-matrices are used:

A[:m, :n] (or A[:n, :m] when trans_a=True)
B[:n, :k] (or B[:k, :n] when trans_b=True)
C[:m, :k]

If shape is None, the full shape of A[:, :], B[:, :], and C[:, :] are considered provided that their shapes are compatible for the matrix product.

trans_abool, default=False

If False, the operator \(\operatorname{op}(\mathbf{A}) = \mathbf{A}\) is used. If True, the operator \(\operatorname{op}(\mathbf{A}) = \mathbf{A}^{\intercal}\) is used.

trans_abbool, default=False

If False, the operator \(\operatorname{op}(\mathbf{B}) = \mathbf{B}\) is used. If True, the operator \(\operatorname{op}(\mathbf{B}) = \mathbf{B}^{\intercal}\) is used.

alphafloat, default=1

The parameter \(\alpha\).

betafloat, default=0

The parameter \(\beta\).

overwritebool, default=False

If True, the output C is overwritten to the input C.

Note

When overwrite is set to True, the input matrix C should have column-major (Fortran) ordering.

Returns:

Cnumpy.ndarray: A 2D matrix of the same shape as the input matrix C (and not the shape of the sub-matrix). The upper-left sub-matrix of C contains the matrix-matrix multiplication corresponding to the sub-matrices determined by the shape argument. If overwrite is set to True, the output matrix C is becomes a view for the matrix C.

See also

detkit.solve_triangular

Notes

Matrix-matrix multiplication of sub-matrices:

Let \(\mathbf{A}_{[:m,:n]}\) denote the sub-matrix of the size \(m \times n\) to be the upper-left corner of matrix \(\mathbf{A}\). Given matrices \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\) and parameters \(\alpha\) and \(\beta\), this function computes

\[\mathbf{X}_{[:m, :k]} = \alpha \operatorname{op}(\mathbf{A})_{[:m, :n]} \operatorname{op}(\mathbf{B})_{[:n, :k]} + \beta \mathbf{C}_{[:m, :k]},\]

if overwrite is False, or

\[\mathbf{C}_{[:m, :k]} \leftarrow \alpha \operatorname{op}(\mathbf{A})_{[:m, :n]} \operatorname{op}(\mathbf{B})_{[:n, :k]} + \beta \mathbf{C}_{[:m, :k]},\]

if overwrite is True.

Interpreting the output matrix:

The shape of the output variable X is the same as the shape of C, even if a smaller sub-matrix is considered. Regardless, only the corresponding upper-left sub-matrix of X has meaningful data. Namely, if shape=(m, n, k), the slice X[:m, :k] should be considered as the result. As such, the relation

X[:m, :k] = alpha * A[:m, :n] @ [:n, :k] + beta * C[:m, :k]

or, for instance,

X[:m, :k] = alpha * A[:n, :m].T @ [:k, :n].T + beta * C[:m, :k]

(when trans_a and trans_b are True) should hold.

Comparison with numpy.matmul:

To perfron matrix-matrix multiplication for a sub-matrix of the input matrices using numpy.matmul function (or any other similar matrix multiplication functions), you should pass a slice of the matrix to the function. This approach is not memory-efficient since the sliced array allocates new memory.

In contrast, using detkit.matmul together with the shape argument, no memory slice is created during the inner computation, rather, the data from the original input matrix is accessed efficiently.

Implementation:

This function is a wrapper around LAPACK’s sgemm (for 32-bit precision) and dgemm (for 64-bit precision).

This function is internally used for detkit.memdet() for efficient computation of matrix determinant under memory constraint.

References

LAPACK

Examples

>>> from detkit import matmul, Memory
>>> import numpy

>>> # Create random matrices
>>> p = 1000
>>> A = numpy.random.randn(p, p)
>>> B = numpy.random.randn(p, p)
>>> C = numpy.random.randn(p, p)

>>> # Make sure arrays have column-ordering
>>> A = numpy.asfortranarray(A)
>>> B = numpy.asfortranarray(B)
>>> C = numpy.asfortranarray(C)

>>> # 32-bit precision data
>>> A = A.astype(numpy.float32)
>>> B = B.astype(numpy.float32)
>>> C = C.astype(numpy.float32)

>>> # Get a copy of C (for the purpose of comparison) since we will
>>> # overwrite C
>>> C_copy = numpy.copy(C)

>>> # Choose size of sub-matrices that are smaller or equal to p
>>> m, n, k = 900, 800, 700

>>> # Track memory allocation to check if matmul operation is not
>>> # creating any new memory.
>>> mem = Memory()
>>> mem.set()

>>> # Perform matrix-matrix multiplication for sub-matrices
>>> alpha, beta = numpy.float32(1.0), numpy.float32(2.0)
>>> X = matmul(A, B, C, shape=(m, n, k), alpha=alpha, beta=beta,
...            overwrite=True)

>>> # Check peak memory allocation (compared to memory of a sum-matrix)
>>> slice_nbytes = m * n * A.dtype.itemsize
>>> print(mem.peak() / slice_nbytes)
0.001

>>> # Check if alpha * A @ B + beta * C_copy = X holds.
>>> atol = numpy.finfo(A.dtype).resolution
>>> print(numpy.allclose(alpha * A[:m, :n] @ B[:n, :k] +
...                beta * C_copy[:m, :k], X[:m, :k], atol=10*tol))
True

>>> # When overwrite is set to True, check if X is indeed a view of C
>>> numpy.may_share_memory(X, C)
True

In the above example, the object mem of class detkit.Memory tracks memory allocation. The peak of allocated memory during the matrix multiplication is three orders of magnitude smaller than the size of one of the matrices slices, confirming that no new array slice was created during the operation.