imate
C++/CUDA Reference
cMatrixOperations< DataType > Class Template Reference

A static class for matrix-vector operations, which are similar to the level-2 operations of the BLAS library. This class acts as a templated namespace, where all member methods are public and static. More...

#include <c_matrix_operations.h>

Static Public Member Functions

static void dense_matvec (const DataType *A, const DataType *b, const LongIndexType num_rows, const LongIndexType num_columns, const FlagType A_is_row_major, DataType *c)
 Computes the matrix vector multiplication \( \boldsymbol{c} = \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is a dense matrix. More...
 
static void dense_matvec_plus (const DataType *A, const DataType *b, const DataType alpha, const LongIndexType num_rows, const LongIndexType num_columns, const FlagType A_is_row_major, DataType *c)
 Computes the operation \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is a dense matrix. More...
 
static void dense_transposed_matvec (const DataType *A, const DataType *b, const LongIndexType num_rows, const LongIndexType num_columns, const FlagType A_is_row_major, DataType *c)
 Computes matrix vector multiplication \(\boldsymbol{c} = \mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is dense, and \( \mathbf{A}^{\intercal} \) is the transpose of the matrix \( \mathbf{A} \). More...
 
static void dense_transposed_matvec_plus (const DataType *A, const DataType *b, const DataType alpha, const LongIndexType num_rows, const LongIndexType num_columns, const FlagType A_is_row_major, DataType *c)
 Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is dense, and \( \mathbf{A}^{\intercal} \) is the transpose of the matrix \( \mathbf{A} \). More...
 
static void csr_matvec (const DataType *A_data, const LongIndexType *A_column_indices, const LongIndexType *A_index_pointer, const DataType *b, const LongIndexType num_rows, DataType *c)
 Computes \( \boldsymbol{c} = \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void csr_matvec_plus (const DataType *A_data, const LongIndexType *A_column_indices, const LongIndexType *A_index_pointer, const DataType *b, const DataType alpha, const LongIndexType num_rows, DataType *c)
 Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void csr_transposed_matvec (const DataType *A_data, const LongIndexType *A_column_indices, const LongIndexType *A_index_pointer, const DataType *b, const LongIndexType num_rows, const LongIndexType num_columns, DataType *c)
 Computes \(\boldsymbol{c} =\mathbf{A}^{\intercal} \boldsymbol{b}\) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void csr_transposed_matvec_plus (const DataType *A_data, const LongIndexType *A_column_indices, const LongIndexType *A_index_pointer, const DataType *b, const DataType alpha, const LongIndexType num_rows, DataType *c)
 Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A}^{\intercal} \boldsymbol{b}\) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void csc_matvec (const DataType *A_data, const LongIndexType *A_row_indices, const LongIndexType *A_index_pointer, const DataType *b, const LongIndexType num_rows, const LongIndexType num_columns, DataType *c)
 Computes \( \boldsymbol{c} = \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void csc_matvec_plus (const DataType *A_data, const LongIndexType *A_row_indices, const LongIndexType *A_index_pointer, const DataType *b, const DataType alpha, const LongIndexType num_columns, DataType *c)
 Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void csc_transposed_matvec (const DataType *A_data, const LongIndexType *A_row_indices, const LongIndexType *A_index_pointer, const DataType *b, const LongIndexType num_columns, DataType *c)
 Computes \(\boldsymbol{c} =\mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void csc_transposed_matvec_plus (const DataType *A_data, const LongIndexType *A_row_indices, const LongIndexType *A_index_pointer, const DataType *b, const DataType alpha, const LongIndexType num_columns, DataType *c)
 Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector. More...
 
static void create_band_matrix (const DataType *diagonals, const DataType *supdiagonals, const IndexType non_zero_size, const FlagType tridiagonal, DataType **matrix)
 Creates bi-diagonal or symmetric tri-diagonal matrix from the diagonal array (diagonals) and off-diagonal array (supdiagonals). More...
 

Detailed Description

template<typename DataType>
class cMatrixOperations< DataType >

A static class for matrix-vector operations, which are similar to the level-2 operations of the BLAS library. This class acts as a templated namespace, where all member methods are public and static.

This class implements matrix-ector multiplication for three types of matrices:

  • Dense matrix (both row major and column major)
  • Compressed sparse row matrix (CSR)
  • Compressed sparse column matrix (CSC)

For each of the above matrix types, there are four kinds of matrix vector multiplications implemented.

  1. dot : performs \( \boldsymbol{c} = \mathbf{A} \boldsymbol{b} \).
  2. dot_plus : performs \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A} \boldsymbol{b} \).
  3. transpose_dot : performs \( \boldsymbol{c} = \mathbf{A}^{\intercal} \boldsymbol{b} \).
  4. transpose_dot_plus : performs \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A}^{\intercal} \boldsymbol{b} \).
See also
cVectorOperations

Definition at line 56 of file c_matrix_operations.h.

Member Function Documentation

◆ create_band_matrix()

template<typename DataType >
void cMatrixOperations< DataType >::create_band_matrix ( const DataType *  diagonals,
const DataType *  supdiagonals,
const IndexType  non_zero_size,
const FlagType  tridiagonal,
DataType **  matrix 
)
static

Creates bi-diagonal or symmetric tri-diagonal matrix from the diagonal array (diagonals) and off-diagonal array (supdiagonals).

The output is written in place (in matrix). The output is only written up to the non_zero_size element, that is: matrix[:non_zero_size,:non_zero_size] is filled, and the rest is assumed to be zero.

Depending on tridiagonal, the matrix is upper bi-diagonal or symmetric tri-diagonal.

Parameters
[in]diagonalsAn array of length n. All elements diagonals create the diagonals of matrix.
[in]supdiagonalsAn array of length n. Elements supdiagonals[0:-1] create the upper off-diagonal of matrix, making matrix an upper bi-diagonal matrix. In addition, if tridiagonal is set to 1, the lower off-diagonal is also created similar to the upper off-diagonal, making matrix a symmetric tri-diagonal matrix.
[in]non_zero_sizeUp to the matrix[:non_zero_size,:non_zero_size] of matrix will be written. At most, non_zero_size can be n, which is the size of diagonals array and the size of the square matrix. If non_zero_size is less than n, it is due to the fact that either diagonals or supdiagonals has zero elements after the size element (possibly due to early termination of Lanczos iterations method).
[in]tridiagonalBoolean. If set to 0, the matrix T becomes upper bi-diagonal. If set to 1, the matrix becomes symmetric tri-diagonal.
[out]matrixA 2D matrix (written in place) of the shape (n,n). This is the output of this function. This matrix is assumed to be initialized to zero before calling this function.

Definition at line 1021 of file c_matrix_operations.cpp.

1027 {
1028  for (IndexType j=0; j < non_zero_size; ++j)
1029  {
1030  // Diagonals
1031  matrix[j][j] = diagonals[j];
1032 
1033  // Off diagonals
1034  if (j < non_zero_size-1)
1035  {
1036  // Sup-diagonal
1037  matrix[j][j+1] = supdiagonals[j];
1038 
1039  // Sub-diagonal, making symmetric tri-diagonal matrix
1040  if (tridiagonal)
1041  {
1042  matrix[j+1][j] = supdiagonals[j];
1043  }
1044  }
1045  }
1046 }
int IndexType
Definition: types.h:65

◆ csc_matvec()

template<typename DataType >
void cMatrixOperations< DataType >::csc_matvec ( const DataType *  A_data,
const LongIndexType A_row_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const LongIndexType  num_rows,
const LongIndexType  num_columns,
DataType *  c 
)
static

Computes \( \boldsymbol{c} = \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

Parameters
[in]A_dataCSC format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_row_indicesCSC format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSC format index pointer. The length of this array is one plus the number of columns of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
[in]num_rowsNumber of rows of the matrix A.
[in]num_columnsNumber of columns of the matrix A. This is essentially the size of A_index_pointer array minus one.
[out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 735 of file c_matrix_operations.cpp.

743 {
744  LongIndexType index_pointer;
745  LongIndexType row;
746  LongIndexType column;
747 
748  // Initialize output to zero
749  for (row=0; row < num_rows; ++row)
750  {
751  c[row] = 0.0;
752  }
753 
754  for (column=0; column < num_columns; ++column)
755  {
756  for (index_pointer=A_index_pointer[column];
757  index_pointer < A_index_pointer[column+1];
758  ++index_pointer)
759  {
760  row = A_row_indices[index_pointer];
761  c[row] += A_data[index_pointer] * b[column];
762  }
763  }
764 }
int LongIndexType
Definition: types.h:60

Referenced by cCSCMatrix< DataType >::dot().

Here is the caller graph for this function:

◆ csc_matvec_plus()

template<typename DataType >
void cMatrixOperations< DataType >::csc_matvec_plus ( const DataType *  A_data,
const LongIndexType A_row_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const DataType  alpha,
const LongIndexType  num_columns,
DataType *  c 
)
static

Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

Parameters
[in]A_dataCSC format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_row_indicesCSC format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSC format index pointer. The length of this array is one plus the number of columns of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
[in]alphaA scalar that scales the matrix vector multiplication.
[in]num_columnsNumber of columns of the matrix A. This is essentially the size of A_index_pointer array minus one.
[in,out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 801 of file c_matrix_operations.cpp.

809 {
810  if (alpha == 0.0)
811  {
812  return;
813  }
814 
815  LongIndexType index_pointer;
816  LongIndexType row;
817  LongIndexType column;
818 
819  for (column=0; column < num_columns; ++column)
820  {
821  for (index_pointer=A_index_pointer[column];
822  index_pointer < A_index_pointer[column+1];
823  ++index_pointer)
824  {
825  row = A_row_indices[index_pointer];
826  c[row] += alpha * A_data[index_pointer] * b[column];
827  }
828  }
829 }

Referenced by cCSCMatrix< DataType >::dot_plus().

Here is the caller graph for this function:

◆ csc_transposed_matvec()

template<typename DataType >
void cMatrixOperations< DataType >::csc_transposed_matvec ( const DataType *  A_data,
const LongIndexType A_row_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const LongIndexType  num_columns,
DataType *  c 
)
static

Computes \(\boldsymbol{c} =\mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float, the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A_dataCSC format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_row_indicesCSC format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSC format index pointer. The length of this array is one plus the number of columns of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
num_columnsNumber of columns of the matrix A. This is essentially the size of A_index_pointer array minus one.
[out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 872 of file c_matrix_operations.cpp.

879 {
880  LongIndexType index_pointer;
881  LongIndexType row;
882  LongIndexType column;
883  long double sum;
884 
885  for (column=0; column < num_columns; ++column)
886  {
887  sum = 0.0;
888  for (index_pointer=A_index_pointer[column];
889  index_pointer < A_index_pointer[column+1];
890  ++index_pointer)
891  {
892  row = A_row_indices[index_pointer];
893  sum += A_data[index_pointer] * b[row];
894  }
895  c[column] = static_cast<DataType>(sum);
896  }
897 }

Referenced by cCSCMatrix< DataType >::transpose_dot().

Here is the caller graph for this function:

◆ csc_transposed_matvec_plus()

template<typename DataType >
void cMatrixOperations< DataType >::csc_transposed_matvec_plus ( const DataType *  A_data,
const LongIndexType A_row_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const DataType  alpha,
const LongIndexType  num_columns,
DataType *  c 
)
static

Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse column (CSC) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A_dataCSC format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_row_indicesCSC format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSC format index pointer. The length of this array is one plus the number of columns of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
[in]alphaA scalar that scales the matrix vector multiplication.
num_columnsNumber of columns of the matrix A. This is essentially the size of A_index_pointer array minus one.
[in,out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 943 of file c_matrix_operations.cpp.

951 {
952  if (alpha == 0.0)
953  {
954  return;
955  }
956 
957  LongIndexType index_pointer;
958  LongIndexType row;
959  LongIndexType column;
960  long double sum;
961 
962  for (column=0; column < num_columns; ++column)
963  {
964  sum = 0.0;
965  for (index_pointer=A_index_pointer[column];
966  index_pointer < A_index_pointer[column+1];
967  ++index_pointer)
968  {
969  row = A_row_indices[index_pointer];
970  sum += A_data[index_pointer] * b[row];
971  }
972  c[column] += static_cast<DataType>(alpha * sum);
973  }
974 }

Referenced by cCSCMatrix< DataType >::transpose_dot_plus().

Here is the caller graph for this function:

◆ csr_matvec()

template<typename DataType >
void cMatrixOperations< DataType >::csr_matvec ( const DataType *  A_data,
const LongIndexType A_column_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const LongIndexType  num_rows,
DataType *  c 
)
static

Computes \( \boldsymbol{c} = \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float, the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A_dataCSR format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_column_indicesCSR format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSR format index pointer. The length of this array is one plus the number of rows of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
[in]num_rowsNumber of rows of the matrix A. This is essentially the size of A_index_pointer array minus one.
[out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 469 of file c_matrix_operations.cpp.

476 {
477  LongIndexType index_pointer;
478  LongIndexType row;
479  LongIndexType column;
480  long double sum;
481 
482  for (row=0; row < num_rows; ++row)
483  {
484  sum = 0.0;
485  for (index_pointer=A_index_pointer[row];
486  index_pointer < A_index_pointer[row+1];
487  ++index_pointer)
488  {
489  column = A_column_indices[index_pointer];
490  sum += A_data[index_pointer] * b[column];
491  }
492  c[row] = static_cast<DataType>(sum);
493  }
494 }

Referenced by cCSRMatrix< DataType >::dot().

Here is the caller graph for this function:

◆ csr_matvec_plus()

template<typename DataType >
void cMatrixOperations< DataType >::csr_matvec_plus ( const DataType *  A_data,
const LongIndexType A_column_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const DataType  alpha,
const LongIndexType  num_rows,
DataType *  c 
)
static

Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A_dataCSR format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_column_indicesCSR format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSR format index pointer. The length of this array is one plus the number of rows of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
[in]alphaA scalar that scales the matrix vector multiplication.
[in]num_rowsNumber of rows of the matrix A. This is essentially the size of A_index_pointer array minus one.
[in,out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 540 of file c_matrix_operations.cpp.

548 {
549  if (alpha == 0.0)
550  {
551  return;
552  }
553 
554  LongIndexType index_pointer;
555  LongIndexType row;
556  LongIndexType column;
557  long double sum;
558 
559  for (row=0; row < num_rows; ++row)
560  {
561  sum = 0.0;
562  for (index_pointer=A_index_pointer[row];
563  index_pointer < A_index_pointer[row+1];
564  ++index_pointer)
565  {
566  column = A_column_indices[index_pointer];
567  sum += A_data[index_pointer] * b[column];
568  }
569  c[row] += alpha * static_cast<DataType>(sum);
570  }
571 }

Referenced by cCSRMatrix< DataType >::dot_plus().

Here is the caller graph for this function:

◆ csr_transposed_matvec()

template<typename DataType >
void cMatrixOperations< DataType >::csr_transposed_matvec ( const DataType *  A_data,
const LongIndexType A_column_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const LongIndexType  num_rows,
const LongIndexType  num_columns,
DataType *  c 
)
static

Computes \(\boldsymbol{c} =\mathbf{A}^{\intercal} \boldsymbol{b}\) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

Parameters
[in]A_dataCSR format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_column_indicesCSR format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSR format index pointer. The length of this array is one plus the number of rows of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
[in]num_rowsNumber of rows of the matrix A. This is essentially the size of A_index_pointer array minus one.
[in]num_columnsNumber of columns of the matrix A.
[out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 606 of file c_matrix_operations.cpp.

614 {
615  LongIndexType index_pointer;
616  LongIndexType row;
617  LongIndexType column;
618 
619  // Initialize output to zero
620  for (column=0; column < num_columns; ++column)
621  {
622  c[column] = 0.0;
623  }
624 
625  for (row=0; row < num_rows; ++row)
626  {
627  for (index_pointer=A_index_pointer[row];
628  index_pointer < A_index_pointer[row+1];
629  ++index_pointer)
630  {
631  column = A_column_indices[index_pointer];
632  c[column] += A_data[index_pointer] * b[row];
633  }
634  }
635 }

Referenced by cCSRMatrix< DataType >::transpose_dot().

Here is the caller graph for this function:

◆ csr_transposed_matvec_plus()

template<typename DataType >
void cMatrixOperations< DataType >::csr_transposed_matvec_plus ( const DataType *  A_data,
const LongIndexType A_column_indices,
const LongIndexType A_index_pointer,
const DataType *  b,
const DataType  alpha,
const LongIndexType  num_rows,
DataType *  c 
)
static

Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A}^{\intercal} \boldsymbol{b}\) where \( \mathbf{A} \) is compressed sparse row (CSR) matrix and \( \boldsymbol{b} \) is a dense vector. The output \( \boldsymbol{c} \) is a dense vector.

Parameters
[in]A_dataCSR format data array of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_column_indicesCSR format column indices of the sparse matrix. The length of this array is the nnz of the matrix.
[in]A_index_pointerCSR format index pointer. The length of this array is one plus the number of rows of the matrix. Also, the first element of this array is 0, and the last element is the nnz of the matrix.
[in]bColumn vector with same size of the number of columns of A.
[in]alphaA scalar that scales the matrix vector multiplication.
[in]num_rowsNumber of rows of the matrix A. This is essentially the size of A_index_pointer array minus one.
[in,out]cOutput column vector with the same size as b. This array is written in-place.

Definition at line 672 of file c_matrix_operations.cpp.

680 {
681  if (alpha == 0.0)
682  {
683  return;
684  }
685 
686  LongIndexType index_pointer;
687  LongIndexType row;
688  LongIndexType column;
689 
690  for (row=0; row < num_rows; ++row)
691  {
692  for (index_pointer=A_index_pointer[row];
693  index_pointer < A_index_pointer[row+1];
694  ++index_pointer)
695  {
696  column = A_column_indices[index_pointer];
697  c[column] += alpha * A_data[index_pointer] * b[row];
698  }
699  }
700 }

Referenced by cCSRMatrix< DataType >::transpose_dot_plus().

Here is the caller graph for this function:

◆ dense_matvec()

template<typename DataType >
void cMatrixOperations< DataType >::dense_matvec ( const DataType *  A,
const DataType *  b,
const LongIndexType  num_rows,
const LongIndexType  num_columns,
const FlagType  A_is_row_major,
DataType *  c 
)
static

Computes the matrix vector multiplication \( \boldsymbol{c} = \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is a dense matrix.

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float, the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A1D array that represents a 2D dense array with either C (row) major ordering or Fortran (column) major ordering. The major ordering should de defined by A_is_row_major flag.
[in]bColumn vector
[in]num_rowsNumber of rows of A
[in]num_columnsNumber of columns of A
[in]A_is_row_majorBoolean, can be 0 or 1 as follows:
  • If A is row major (C ordering where the last index is contiguous) this value should be 1.
  • If A is column major (Fortran ordering where the first index is contiguous), this value should be set to 0.
[out]cThe output column vector (written in-place).

Definition at line 61 of file c_matrix_operations.cpp.

68 {
69  #if (USE_CBLAS == 1)
70 
71  // Using OpenBlas
72  CBLAS_LAYOUT layout;
73  if (A_is_row_major)
74  {
75  layout = CblasRowMajor;
76  }
77  else
78  {
79  layout = CblasColMajor;
80  }
81 
82  CBLAS_TRANSPOSE transpose = CblasNoTrans;
83  int lda = num_rows;
84  int incb = 1;
85  int incc = 1;
86  DataType alpha = 1.0;
87  DataType beta = 0.0;
88 
89  cblas_interface::xgemv(layout, transpose, num_rows, num_columns, alpha, A,
90  lda, b, incb, beta, c, incc);
91 
92  #else
93 
94  // Not using OpenBlas
95  LongIndexType j;
96  long double sum;
97  LongIndexType chunk = 5;
98  LongIndexType num_columns_chunked = num_columns - (num_columns % chunk);
99 
100  // Determine major order of A
101  if (A_is_row_major)
102  {
103  // For row major (C ordering) matrix A
104  for (LongIndexType i=0; i < num_rows; ++i)
105  {
106  sum = 0.0;
107  for (j=0; j < num_columns_chunked; j+= chunk)
108  {
109  sum += A[i*num_columns + j] * b[j] +
110  A[i*num_columns + j+1] * b[j+1] +
111  A[i*num_columns + j+2] * b[j+2] +
112  A[i*num_columns + j+3] * b[j+3] +
113  A[i*num_columns + j+4] * b[j+4];
114  }
115 
116  for (j= num_columns_chunked; j < num_columns; ++j)
117  {
118  sum += A[i*num_columns + j] * b[j];
119  }
120 
121  c[i] = static_cast<DataType>(sum);
122  }
123  }
124  else
125  {
126  // For column major (Fortran ordering) matrix A
127  for (LongIndexType i=0; i < num_rows; ++i)
128  {
129  sum = 0.0;
130  for (j=0; j < num_columns; ++j)
131  {
132  sum += A[i + num_rows*j] * b[j];
133  }
134  c[i] = static_cast<DataType>(sum);
135  }
136  }
137 
138  #endif
139 }

Referenced by cDenseMatrix< DataType >::dot().

Here is the caller graph for this function:

◆ dense_matvec_plus()

template<typename DataType >
void cMatrixOperations< DataType >::dense_matvec_plus ( const DataType *  A,
const DataType *  b,
const DataType  alpha,
const LongIndexType  num_rows,
const LongIndexType  num_columns,
const FlagType  A_is_row_major,
DataType *  c 
)
static

Computes the operation \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A} \boldsymbol{b} \) where \( \mathbf{A} \) is a dense matrix.

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A1D array that represents a 2D dense array with either C (row) major ordering or Fortran (column) major ordering. The major ordering should de defined by A_is_row_major flag.
[in]bColumn vector
[in]alphaA scalar that scales the matrix vector multiplication.
[in]num_rowsNumber of rows of A
[in]num_columnsNumber of columns of A
[in]A_is_row_majorBoolean, can be 0 or 1 as follows:
  • If A is row major (C ordering where the last index is contiguous) this value should be 1.
  • If A is column major (Fortran ordering where the first index is contiguous), this value should be set to 0.
[in,out]cThe output column vector (written in-place).

Definition at line 181 of file c_matrix_operations.cpp.

189 {
190  if (alpha == 0.0)
191  {
192  return;
193  }
194 
195  LongIndexType j;
196  long double sum;
197  LongIndexType chunk = 5;
198  LongIndexType num_columns_chunked = num_columns - (num_columns % chunk);
199 
200  // Determine major order of A
201  if (A_is_row_major)
202  {
203  // For row major (C ordering) matrix A
204  for (LongIndexType i=0; i < num_rows; ++i)
205  {
206  sum = 0.0;
207  for (j=0; j < num_columns_chunked; j+= chunk)
208  {
209  sum += A[i*num_columns + j] * b[j] +
210  A[i*num_columns + j+1] * b[j+1] +
211  A[i*num_columns + j+2] * b[j+2] +
212  A[i*num_columns + j+3] * b[j+3] +
213  A[i*num_columns + j+4] * b[j+4];
214  }
215 
216  for (j= num_columns_chunked; j < num_columns; ++j)
217  {
218  sum += A[i*num_columns + j] * b[j];
219  }
220 
221  c[i] += alpha * static_cast<DataType>(sum);
222  }
223  }
224  else
225  {
226  // For column major (Fortran ordering) matrix A
227  for (LongIndexType i=0; i < num_rows; ++i)
228  {
229  sum = 0.0;
230  for (j=0; j < num_columns; ++j)
231  {
232  sum += A[i + num_rows*j] * b[j];
233  }
234  c[i] += alpha* static_cast<DataType>(sum);
235  }
236  }
237 }

Referenced by cDenseMatrix< DataType >::dot_plus().

Here is the caller graph for this function:

◆ dense_transposed_matvec()

template<typename DataType >
void cMatrixOperations< DataType >::dense_transposed_matvec ( const DataType *  A,
const DataType *  b,
const LongIndexType  num_rows,
const LongIndexType  num_columns,
const FlagType  A_is_row_major,
DataType *  c 
)
static

Computes matrix vector multiplication \(\boldsymbol{c} = \mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is dense, and \( \mathbf{A}^{\intercal} \) is the transpose of the matrix \( \mathbf{A} \).

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float, the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A1D array that represents a 2D dense array with either C (row) major ordering or Fortran (column) major ordering. The major ordering should de defined by A_is_row_major flag.
[in]bColumn vector
[in]num_rowsNumber of rows of A
[in]num_columnsNumber of columns of A
[in]A_is_row_majorBoolean, can be 0 or 1 as follows:
  • If A is row major (C ordering where the last index is contiguous) this value should be 1.
  • f A is column major (Fortran ordering where the first index is contiguous), this value should be set to 0.
[out]cThe output column vector (written in-place).

Definition at line 278 of file c_matrix_operations.cpp.

285 {
286  LongIndexType i;
287  long double sum;
288  LongIndexType chunk = 5;
289  LongIndexType num_rows_chunked = num_rows - (num_rows % chunk);
290 
291  // Determine major order of A
292  if (A_is_row_major)
293  {
294  // For row major (C ordering) matrix A
295  for (LongIndexType j=0; j < num_columns; ++j)
296  {
297  sum = 0.0;
298  for (i=0; i < num_rows; ++i)
299  {
300  sum += A[i*num_columns + j] * b[i];
301  }
302  c[j] = static_cast<DataType>(sum);
303  }
304  }
305  else
306  {
307  // For column major (Fortran ordering) matrix A
308  for (LongIndexType j=0; j < num_columns; ++j)
309  {
310  sum = 0.0;
311  for (i=0; i < num_rows_chunked; i += chunk)
312  {
313  sum += A[i + num_rows*j] * b[i] +
314  A[i+1 + num_rows*j] * b[i+1] +
315  A[i+2 + num_rows*j] * b[i+2] +
316  A[i+3 + num_rows*j] * b[i+3] +
317  A[i+4 + num_rows*j] * b[i+4];
318  }
319 
320  for (i=num_rows_chunked; i < num_rows; ++i)
321  {
322  sum += A[i + num_rows*j] * b[i];
323  }
324 
325  c[j] = static_cast<DataType>(sum);
326  }
327  }
328 }

Referenced by cDenseMatrix< DataType >::transpose_dot().

Here is the caller graph for this function:

◆ dense_transposed_matvec_plus()

template<typename DataType >
void cMatrixOperations< DataType >::dense_transposed_matvec_plus ( const DataType *  A,
const DataType *  b,
const DataType  alpha,
const LongIndexType  num_rows,
const LongIndexType  num_columns,
const FlagType  A_is_row_major,
DataType *  c 
)
static

Computes \( \boldsymbol{c} = \boldsymbol{c} + \alpha \mathbf{A}^{\intercal} \boldsymbol{b} \) where \( \mathbf{A} \) is dense, and \( \mathbf{A}^{\intercal} \) is the transpose of the matrix \( \mathbf{A} \).

The reduction variable (here, sum ) is of the type long double. This is becase when DataType is float the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Parameters
[in]A1D array that represents a 2D dense array with either C (row) major ordering or Fortran (column) major ordering. The major ordering should de defined by A_is_row_major flag.
[in]bColumn vector
[in]alphaA scalar that scales the matrix vector multiplication.
[in]num_rowsNumber of rows of A
[in]num_columnsNumber of columns of A
[in]A_is_row_majorBoolean, can be 0 or 1 as follows:
  • If A is row major (C ordering where the last index is contiguous) this value should be 1.
  • f A is column major (Fortran ordering where the first index is contiguous), this value should be set to 0.
[in,out]cThe output column vector (written in-place).

Definition at line 371 of file c_matrix_operations.cpp.

379 {
380  if (alpha == 0.0)
381  {
382  return;
383  }
384 
385  LongIndexType i;
386  long double sum;
387  LongIndexType chunk = 5;
388  LongIndexType num_rows_chunked = num_rows - (num_rows % chunk);
389 
390  // Determine major order of A
391  if (A_is_row_major)
392  {
393  // For row major (C ordering) matrix A
394  for (LongIndexType j=0; j < num_columns; ++j)
395  {
396  sum = 0.0;
397  for (i=0; i < num_rows; ++i)
398  {
399  sum += A[i*num_columns + j] * b[i];
400  }
401  c[j] += alpha * static_cast<DataType>(sum);
402  }
403  }
404  else
405  {
406  // For column major (Fortran ordering) matrix A
407  for (LongIndexType j=0; j < num_columns; ++j)
408  {
409  sum = 0.0;
410  for (i=0; i < num_rows_chunked; i += chunk)
411  {
412  sum += A[i + num_rows*j] * b[i] +
413  A[i+1 + num_rows*j] * b[i+1] +
414  A[i+2 + num_rows*j] * b[i+2] +
415  A[i+3 + num_rows*j] * b[i+3] +
416  A[i+4 + num_rows*j] * b[i+4];
417  }
418 
419  for (i=num_rows_chunked; i < num_rows; ++i)
420  {
421  sum += A[i + num_rows*j] * b[i];
422  }
423 
424  c[j] += alpha * static_cast<DataType>(sum);
425  }
426  }
427 }

Referenced by cDenseMatrix< DataType >::transpose_dot_plus().

Here is the caller graph for this function:

The documentation for this class was generated from the following files: