matmul#
- detkit.matmul(A, B, C, shape=None, trans_a=False, trans_b=False, alpha=1, beta=0, overwrite=False)#
Matrix-matrix multiplication for sub-matrices.
- Parameters:
- Anumpy.ndarray
A rectangular matrix with either
float64
orfloat32
data type stored with column-major ordering.- Anumpy.ndarray
A rectangular matrix with either
float64
orfloat32
data type stored with column-major ordering.- Bnumpy.ndarray
A rectangular matrix with either
float64
orfloat32
data type stored with row-major or column-major ordering.- shapetuple, default=None
A tuple of size three, determining the shape of an upper-left sub-matrices of A, B, and C to be referenced. Namely, if
shape
is given as the tuple(m, n, k)
, thenm
is the number of rows of \(\mathrm{op}(\mathbf{A})\).n
is the number of columns of \(\mathrm{op}(\mathbf{A})\), which is also the number of rows of \(\mathrm{op}(\mathbf{B})\).k
is the number of columns of \(\mathrm{op}(\mathbf{C})\).
As such the following sub-matrices are used:
A[:m, :n]
(orA[:n, :m]
whentrans_a=True
)B[:n, :k]
(orB[:k, :n]
whentrans_b=True
)C[:m, :k]
If
shape
is None, the full shape ofA[:, :]
,B[:, :]
, andC[:, :]
are considered provided that their shapes are compatible for the matrix product.- trans_abool, default=False
If False, the operator \(\operatorname{op}(\mathbf{A}) = \mathbf{A}\) is used. If True, the operator \(\operatorname{op}(\mathbf{A}) = \mathbf{A}^{\intercal}\) is used.
- trans_abbool, default=False
If False, the operator \(\operatorname{op}(\mathbf{B}) = \mathbf{B}\) is used. If True, the operator \(\operatorname{op}(\mathbf{B}) = \mathbf{B}^{\intercal}\) is used.
- alphafloat, default=1
The parameter \(\alpha\).
- betafloat, default=0
The parameter \(\beta\).
- overwritebool, default=False
If True, the output C is overwritten to the input C.
Note
When
overwrite
is set to True, the input matrix C should have column-major (Fortran) ordering.
- Returns:
- Cnumpy.ndarray
A 2D matrix of the same shape as the input matrix C (and not the shape of the sub-matrix). The upper-left sub-matrix of C contains the matrix-matrix multiplication corresponding to the sub-matrices determined by the
shape
argument. Ifoverwrite
is set to True, the output matrix C is becomes a view for the matrix C.
See also
Notes
Matrix-matrix multiplication of sub-matrices:
Let \(\mathbf{A}_{[:m,:n]}\) denote the sub-matrix of the size \(m \times n\) to be the upper-left corner of matrix \(\mathbf{A}\). Given matrices \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\) and parameters \(\alpha\) and \(\beta\), this function computes
\[\mathbf{X}_{[:m, :k]} = \alpha \operatorname{op}(\mathbf{A})_{[:m, :n]} \operatorname{op}(\mathbf{B})_{[:n, :k]} + \beta \mathbf{C}_{[:m, :k]},\]if
overwrite
is False, or\[\mathbf{C}_{[:m, :k]} \leftarrow \alpha \operatorname{op}(\mathbf{A})_{[:m, :n]} \operatorname{op}(\mathbf{B})_{[:n, :k]} + \beta \mathbf{C}_{[:m, :k]},\]if
overwrite
is True.Interpreting the output matrix:
The shape of the output variable X is the same as the shape of C, even if a smaller sub-matrix is considered. Regardless, only the corresponding upper-left sub-matrix of X has meaningful data. Namely, if
shape=(m, n, k)
, the sliceX[:m, :k]
should be considered as the result. As such, the relationX[:m, :k] = alpha * A[:m, :n] @ [:n, :k] + beta * C[:m, :k]
or, for instance,
X[:m, :k] = alpha * A[:n, :m].T @ [:k, :n].T + beta * C[:m, :k]
(when
trans_a
andtrans_b
are True) should hold.Comparison with numpy.matmul:
To perfron matrix-matrix multiplication for a sub-matrix of the input matrices using
numpy.matmul
function (or any other similar matrix multiplication functions), you should pass a slice of the matrix to the function. This approach is not memory-efficient since the sliced array allocates new memory.In contrast, using
detkit.matmul
together with theshape
argument, no memory slice is created during the inner computation, rather, the data from the original input matrix is accessed efficiently.Implementation:
This function is a wrapper around LAPACK’s
sgemm
(for 32-bit precision) anddgemm
(for 64-bit precision).This function is internally used for
detkit.memdet()
for efficient computation of matrix determinant under memory constraint.References
Examples
>>> from detkit import matmul, Memory >>> import numpy >>> # Create random matrices >>> p = 1000 >>> A = numpy.random.randn(p, p) >>> B = numpy.random.randn(p, p) >>> C = numpy.random.randn(p, p) >>> # Make sure arrays have column-ordering >>> A = numpy.asfortranarray(A) >>> B = numpy.asfortranarray(B) >>> C = numpy.asfortranarray(C) >>> # 32-bit precision data >>> A = A.astype(numpy.float32) >>> B = B.astype(numpy.float32) >>> C = C.astype(numpy.float32) >>> # Get a copy of C (for the purpose of comparison) since we will >>> # overwrite C >>> C_copy = numpy.copy(C) >>> # Choose size of sub-matrices that are smaller or equal to p >>> m, n, k = 900, 800, 700 >>> # Track memory allocation to check if matmul operation is not >>> # creating any new memory. >>> mem = Memory() >>> mem.set() >>> # Perform matrix-matrix multiplication for sub-matrices >>> alpha, beta = numpy.float32(1.0), numpy.float32(2.0) >>> X = matmul(A, B, C, shape=(m, n, k), alpha=alpha, beta=beta, ... overwrite=True) >>> # Check peak memory allocation (compared to memory of a sum-matrix) >>> slice_nbytes = m * n * A.dtype.itemsize >>> print(mem.peak() / slice_nbytes) 0.001 >>> # Check if alpha * A @ B + beta * C_copy = X holds. >>> atol = numpy.finfo(A.dtype).resolution >>> print(numpy.allclose(alpha * A[:m, :n] @ B[:n, :k] + ... beta * C_copy[:m, :k], X[:m, :k], atol=10*tol)) True >>> # When overwrite is set to True, check if X is indeed a view of C >>> print(X.base == C.base) True
In the above example, the object
mem
of classdetkit.Memory
tracks memory allocation. The peak of allocated memory during the matrix multiplication is three orders of magnitude smaller than the size of one of the matrices slices, confirming that no new array slice was created during the operation.