imate
C++/CUDA Reference
Loading...
Searching...
No Matches
cuCSCMatrix< DataType > Class Template Reference

Container for CSC matrices. More...

#include <cu_csc_matrix.h>

Inheritance diagram for cuCSCMatrix< DataType >:
Collaboration diagram for cuCSCMatrix< DataType >:

Public Member Functions

 cuCSCMatrix ()
 Default constructor.
 
 cuCSCMatrix (const DataType *A_data_, const LongIndexType *A_indices_, const LongIndexType *A_index_pointer_, const LongIndexType num_rows_, const LongIndexType num_columns_, const FlagType A_is_symmetric_, const int num_gpu_devices_)
 Constructor.
 
virtual ~cuCSCMatrix ()
 Destructor.
 
virtual FlagType is_identity_matrix () const
 Checks whether the matrix is identity.
 
LongIndexType get_nnz () const
 Returns the number of non-zero elements of the sparse matrix.
 
virtual void dot (const DataType *device_vector, DataType *device_product)
 Matrix vector product.
 
virtual void dot_plus (const DataType *device_vector, const DataType alpha, DataType *device_product)
 Matrix vector product written in place.
 
virtual void transpose_dot (const DataType *device_vector, DataType *device_product)
 Transposed-matrix vector product.
 
virtual void transpose_dot_plus (const DataType *device_vector, const DataType alpha, DataType *device_product)
 Transposed-matrix vector product written in place.
 
- Public Member Functions inherited from cuMatrix< DataType >
 cuMatrix ()
 Default constructor.
 
 cuMatrix (const FlagType A_is_symmetric_)
 Constructor.
 
virtual ~cuMatrix ()
 Destructor.
 
DataType get_eigenvalue (const DataType *known_parameters, const DataType known_eigenvalue, const DataType *inquiry_parameters) const
 This virtual function is implemented from its pure virtual function of the base class. In this class, this functio has no use and was only implemented so that this class be able to be instantiated (due to the pure virtual function).
 
virtual void set_symmetry (const FlagType symmetric)
 Specify whether the matrix is symmetic or non-symmetric.
 
- Public Member Functions inherited from cuLinearOperator< DataType >
 cuLinearOperator ()
 Default constructor.
 
 cuLinearOperator (const int num_gpu_devices_)
 Constructor with setting num_rows and num_columns.
 
virtual ~cuLinearOperator ()
 Destructor.
 
cublasHandle_t get_cublas_handle () const
 This function returns a reference to the cublasHandle_t object. The object will be created, if it is not created already.
 
void set_parameters (DataType *parameters_)
 Sets the scalar parameter this->parameters. Parameter is initialized to NULL. However, before calling dot or transpose_dot functions, the parameters must be set.
 
- Public Member Functions inherited from cLinearOperatorBase
 cLinearOperatorBase ()
 Default constructor.
 
 cLinearOperatorBase (const LongIndexType num_rows_, const LongIndexType num_columns_)
 Constructor with setting num_rows and num_columns.
 
virtual ~cLinearOperatorBase ()
 Destructor.
 
LongIndexType get_num_rows () const
 Returns the number of rows of the matrix.
 
LongIndexType get_num_columns () const
 Returns the number of columns of the matrix.
 
IndexType get_num_parameters () const
 Returns the number of parameters of the linear operator.
 
FlagType is_eigenvalue_relation_known () const
 Returns a flag that determines whether a relation between the parameters of the operator and its eigenvalue(s) is known.
 

Protected Member Functions

virtual void copy_host_to_device ()
 Copies the member data from the host memory to the device memory.
 
void allocate_buffer (const int device_id, cusparseOperation_t cusparse_operation, const DataType alpha, const DataType beta, cusparseDnVecDescr_t &cusparse_input_vector, cusparseDnVecDescr_t &cusparse_output_vector, cusparseSpMVAlg_t algorithm)
 Allocates an external buffer for matrix-vector multiplication using cusparseSpMV function.
 
- Protected Member Functions inherited from cuLinearOperator< DataType >
int query_gpu_devices () const
 Before any numerical computation, this method chechs if any gpu device is available on the machine, or notifies the user if nothing was found.
 
void initialize_cublas_handle ()
 Creates a cublasHandle_t object, if not created already.
 
void initialize_cusparse_handle ()
 Creates a cusparseHandle_t object, if not created already.
 

Protected Attributes

const DataType * A_data
 
const LongIndexTypeA_indices
 
const LongIndexTypeA_index_pointer
 
DataType ** device_A_data
 
LongIndexType ** device_A_indices
 
LongIndexType ** device_A_index_pointer
 
void ** device_buffer
 
size_t * device_buffer_num_bytes
 
cusparseSpMatDescr_t * cusparse_matrix_A
 
- Protected Attributes inherited from cuMatrix< DataType >
FlagType A_is_symmetric
 
- Protected Attributes inherited from cuLinearOperator< DataType >
int num_gpu_devices
 
bool copied_host_to_device
 
cublasHandle_t * cublas_handle
 
cusparseHandle_t * cusparse_handle
 
DataType * parameters
 
- Protected Attributes inherited from cLinearOperatorBase
const LongIndexType num_rows
 
const LongIndexType num_columns
 
FlagType eigenvalue_relation_known
 
IndexType num_parameters
 

Detailed Description

template<typename DataType>
class cuCSCMatrix< DataType >

Container for CSC matrices.

The cCSCMatrix holds a two-dimensional compressed sparse column matrix, and can perofrom matrix-vector product and transposed matrix-vector product.

See also
cuMatrix, cuDenseMatrix, cuCSRMatrix, cuCSCAffineMatrixFunction, cCSCMatrix

Definition at line 44 of file cu_csc_matrix.h.

Constructor & Destructor Documentation

◆ cuCSCMatrix() [1/2]

template<typename DataType >
cuCSCMatrix< DataType >::cuCSCMatrix ( )

Default constructor.

Definition at line 42 of file cu_csc_matrix.cu.

42 :
43 A_data(NULL),
44 A_indices(NULL),
45 A_index_pointer(NULL),
46 device_A_data(NULL),
47 device_A_indices(NULL),
49 device_buffer(NULL),
52{
53}
LongIndexType ** device_A_index_pointer
size_t * device_buffer_num_bytes
const LongIndexType * A_index_pointer
cusparseSpMatDescr_t * cusparse_matrix_A
const DataType * A_data
void ** device_buffer
DataType ** device_A_data
const LongIndexType * A_indices
LongIndexType ** device_A_indices

◆ cuCSCMatrix() [2/2]

template<typename DataType >
cuCSCMatrix< DataType >::cuCSCMatrix ( const DataType *  A_data_,
const LongIndexType A_indices_,
const LongIndexType A_index_pointer_,
const LongIndexType  num_rows_,
const LongIndexType  num_columns_,
const FlagType  A_is_symmetric_,
const int  num_gpu_devices_ 
)

Constructor.

Parameters
[in]A_data_1D array of the data content of sparse matrix. The size of the array is the nnz of the matrix.
[in]A_indices_1D array indicating the row of each element in A_data_ . The size of this array is the nnz of the matrix.
[in]A_index_pointer_1D array pointing to the start of new rows in A_indices_ . The size of this array is num_rows+1 . The first element of this array is 0 and the last element of this array is the nnz of the matrix.
[in]num_rows_Number of rows of A
[in]num_columns_Number of columns of A
[in]A_is_symmetric_Boolean. If A is symmetric, set this value to 1, otherwise 0.
[in]num_gpu_devices_Number of GPU devices to be utilzied for parallel processing.

Definition at line 84 of file cu_csc_matrix.cu.

91 :
92
93 // Base class constructor
94 cLinearOperatorBase(num_rows_, num_columns_),
95 cuLinearOperator<DataType>(num_gpu_devices_),
96 cuMatrix<DataType>(A_is_symmetric_),
97
98 // Initializer list
99 A_data(A_data_),
100 A_indices(A_indices_),
101 A_index_pointer(A_index_pointer_),
102 device_A_data(NULL),
103 device_A_indices(NULL),
105 device_buffer(NULL),
107{
109 this->copy_host_to_device();
110
111 // Initialize device buffer
112 this->device_buffer = new void*[this->num_gpu_devices];
113 this->device_buffer_num_bytes = new size_t[this->num_gpu_devices];
114 for (int device_id=0; device_id < this->num_gpu_devices; ++device_id)
115 {
116 this->device_buffer[device_id] = NULL;
117 this->device_buffer_num_bytes[device_id] = 0;
118 }
119}
cLinearOperatorBase()
Default constructor.
virtual void copy_host_to_device()
Copies the member data from the host memory to the device memory.
Base class for linear operators. This class serves as interface for all derived classes.
void initialize_cusparse_handle()
Creates a cusparseHandle_t object, if not created already.
Base class for constant matrices.
Definition cu_matrix.h:45

References cuCSCMatrix< DataType >::copy_host_to_device(), cuCSCMatrix< DataType >::device_buffer, cuCSCMatrix< DataType >::device_buffer_num_bytes, cuLinearOperator< DataType >::initialize_cusparse_handle(), and cuLinearOperator< DataType >::num_gpu_devices.

Here is the call graph for this function:

◆ ~cuCSCMatrix()

template<typename DataType >
cuCSCMatrix< DataType >::~cuCSCMatrix ( )
virtual

Destructor.

Definition at line 130 of file cu_csc_matrix.cu.

131{
132 // Member objects exist if the second constructor was called.
133 if (this->copied_host_to_device)
134 {
135 // Deallocate arrays of data on gpu
136 for (int device_id=0; device_id < this->num_gpu_devices; ++device_id)
137 {
138 // Switch to a device
140
141 // Deallocate
142 CudaAPI<DataType>::del(this->device_A_data[device_id]);
144 this->device_A_indices[device_id]);
146 this->device_A_index_pointer[device_id]);
149 this->cusparse_matrix_A[device_id]);
150 }
151 }
152
153 // Deallocate arrays of pointers on cpu
154 if (this->device_A_data != NULL)
155 {
156 delete[] this->device_A_data;
157 this->device_A_data = NULL;
158 }
159
160 if (this->device_A_indices != NULL)
161 {
162 delete[] this->device_A_indices;
163 this->device_A_indices = NULL;
164 }
165
166 if (this->device_A_index_pointer != NULL)
167 {
168 delete[] this->device_A_index_pointer;
169 this->device_A_index_pointer = NULL;
170 }
171
172 if (this->device_buffer != NULL)
173 {
174 delete[] this->device_buffer;
175 this->device_buffer = NULL;
176 }
177
178 if (this->device_buffer_num_bytes != NULL)
179 {
180 delete[] this->device_buffer_num_bytes;
181 this->device_buffer_num_bytes = NULL;
182 }
183
184 if (this->cusparse_matrix_A != NULL)
185 {
186 delete[] this->cusparse_matrix_A;
187 this->cusparse_matrix_A = NULL;
188 }
189}
static void set_device(int device_id)
Sets the current device in multi-gpu applications.
Definition cuda_api.cu:191
static void del(void *device_array)
Deletes memory on gpu device if its pointer is not NULL, then sets the pointer to NULL.
Definition cuda_api.cu:169
void destroy_cusparse_matrix(cusparseSpMatDescr_t &cusparse_matrix)
Destroy cusparse matrix.

References CudaAPI< ArrayType >::del(), cusparse_api::destroy_cusparse_matrix(), and CudaAPI< ArrayType >::set_device().

Here is the call graph for this function:

Member Function Documentation

◆ allocate_buffer()

template<typename DataType >
void cuCSCMatrix< DataType >::allocate_buffer ( const int  device_id,
cusparseOperation_t  cusparse_operation,
const DataType  alpha,
const DataType  beta,
cusparseDnVecDescr_t &  cusparse_input_vector,
cusparseDnVecDescr_t &  cusparse_output_vector,
cusparseSpMVAlg_t  algorithm 
)
protected

Allocates an external buffer for matrix-vector multiplication using cusparseSpMV function.

If buffer size if not the same as required buffer size, allocate (or reallocate) memory. The allocation is always performed in the first call of this function since buffer size is initialized to zero in constructor. But for the next calls it might not be reallocated if the buffer size is the same.

Parameters
[in]device_idThe ID of the GPU device, from 0 to num_gpu_devices-1.
[in]cusparse_operationThe CuSparfse operation, which can be CUSPARSE_OPERATION_NON_TRANSPOSE or CUSPARSE_OPERATION_TRANSPOSE.
[in]alphaScalar. The parameter \( \alpha \) in matrix-vector multiplication.
[in]betaScalar. The parameter \( \beta \) in matrix-vector multiplication.
[in]cusparse_input_vectorInput vector in the matrix-vector multiplication.
[in]cusparse_output_vectorOutput vector in the matrix-vector multiplication.
[in]algorithmCuSparse algorithm for sparse matrix-vector product. Possible values can be CUSPARSE_SPMV_ALG_DEFAULT, CUSPARSE_SPMV_CSR_ALG1, CUSPARSE_SPMV_CSR_ALG2, etc.

Definition at line 335 of file cu_csc_matrix.cu.

343{
344 // Find the buffer size needed for matrix-vector multiplication
345 size_t required_buffer_size;
347 this->cusparse_handle[device_id], cusparse_operation, alpha,
348 this->cusparse_matrix_A[device_id], cusparse_input_vector, beta,
349 cusparse_output_vector, algorithm, &required_buffer_size);
350
351 if (this->device_buffer_num_bytes[device_id] != required_buffer_size)
352 {
353 // Update the buffer size
354 this->device_buffer_num_bytes[device_id] = required_buffer_size;
355
356 // Delete buffer if it was allocated previously
357 CudaAPI<DataType>::del(this->device_buffer[device_id]);
358
359 // Allocate (or reallocate) buffer on device.
361 this->device_buffer[device_id],
362 this->device_buffer_num_bytes[device_id]);
363 }
364}
static void alloc_bytes(void *&device_array, const size_t num_bytes)
Allocates memory on gpu device. This function uses an existing given pointer.
Definition cuda_api.cu:118
cusparseHandle_t * cusparse_handle
void cusparse_matrix_buffer_size(cusparseHandle_t cusparse_handle, cusparseOperation_t cusparse_operation, const DataType alpha, cusparseSpMatDescr_t cusparse_matrix, cusparseDnVecDescr_t cusparse_input_vector, const DataType beta, cusparseDnVecDescr_t cusparse_output_vector, cusparseSpMVAlg_t algorithm, size_t *buffer_size)

References CudaAPI< ArrayType >::alloc_bytes(), cusparse_api::cusparse_matrix_buffer_size(), and CudaAPI< ArrayType >::del().

Here is the call graph for this function:

◆ copy_host_to_device()

template<typename DataType >
void cuCSCMatrix< DataType >::copy_host_to_device ( )
protectedvirtual

Copies the member data from the host memory to the device memory.

Note
CUDA below version 12 does not have a cusparse api for CSC matrices. As such, for CUDA<12, we treat a CSC matrix as a CSR matrix, but using trapnsposed operations. In addition, we swap the number of columns and rows from the input matrix to the cusparse matrix.

Implements cuMatrix< DataType >.

Definition at line 205 of file cu_csc_matrix.cu.

206{
207 if (!this->copied_host_to_device)
208 {
209 // Set the number of threads
210 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
212 #endif
213
214 // Array sizes
215 LongIndexType A_data_size = this->get_nnz();
216 LongIndexType A_indices_size = A_data_size;
217 LongIndexType A_index_pointer_size = this->num_rows + 1;
218 LongIndexType A_nnz = this->get_nnz();
219
220 // CuSparse API in CUDA below 12 does not support CSC matrix
221 #ifndef CUDA_VERSION
222 #error CUDA_VERSION Undefined!
223 #elif CUDA_VERSION < 12000
224 // Swapping the number of rows and columns to treat the input CSC
225 // matrix as a CSR matrix.
226 LongIndexType csc_num_rows = this->num_columns;
227 LongIndexType csc_num_columns = this->num_rows;
228 #endif
229
230 // Create array of pointers for data on each gpu device
231 this->device_A_data = new DataType*[this->num_gpu_devices];
233 this->device_A_index_pointer = \
234 new LongIndexType*[this->num_gpu_devices];
235 this->cusparse_matrix_A = \
236 new cusparseSpMatDescr_t[this->num_gpu_devices];
237
238 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
239 #pragma omp parallel
240 #endif
241 {
242 // Switch to a device with the same device id as the cpu thread id
243 unsigned int thread_id;
244 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
245 thread_id = omp_get_thread_num();
246 #else
247 thread_id = 0;
248 #endif
249
251
252 // A_data
254 A_data_size);
256 this->A_data, A_data_size, this->device_A_data[thread_id]);
257
258 // A_indices
260 this->device_A_indices[thread_id], A_indices_size);
262 this->A_indices, A_indices_size,
263 this->device_A_indices[thread_id]);
264
265 // A_index_pointer
267 this->device_A_index_pointer[thread_id],
268 A_index_pointer_size);
270 this->A_index_pointer, A_index_pointer_size,
271 this->device_A_index_pointer[thread_id]);
272
273 // Create cusparse matrix
274 #ifndef CUDA_VERSION
275 #error CUDA_VERSION Undefined!
276 #elif CUDA_VERSION < 12000
277 // Treat CSC as CSR matrix with swapped columns and rows
279 this->cusparse_matrix_A[thread_id], csc_num_rows,
280 csc_num_columns, A_nnz, this->device_A_data[thread_id],
281 this->device_A_indices[thread_id],
282 this->device_A_index_pointer[thread_id]);
283 #else
284 // Use CSC api in CUDA >= 12
286 this->cusparse_matrix_A[thread_id], this->num_rows,
287 this->num_columns, A_nnz,
288 this->device_A_data[thread_id],
289 this->device_A_indices[thread_id],
290 this->device_A_index_pointer[thread_id]);
291 #endif
292 }
293
294 // Flag to prevent reinitialization
295 this->copied_host_to_device = true;
296 }
297}
static ArrayType * alloc(const size_t array_size)
Allocates memory on gpu device. This function creates a pointer and returns it.
Definition cuda_api.cu:39
static void copy_to_device(const ArrayType *host_array, const size_t array_size, ArrayType *device_array)
Copies memory on host to device memory.
Definition cuda_api.cu:145
const LongIndexType num_rows
const LongIndexType num_columns
LongIndexType get_nnz() const
Returns the number of non-zero elements of the sparse matrix.
void omp_set_num_threads(int num_threads)
int omp_get_thread_num()
void create_cusparse_csc_matrix(cusparseSpMatDescr_t &cusparse_matrix, const DataIndexType num_rows, const DataIndexType num_columns, const DataIndexType nnz, DataType *RESTRICT device_A_data, DataIndexType *RESTRICT device_A_indices, DataIndexType *RESTRICT device_A_index_pointer)
void create_cusparse_csr_matrix(cusparseSpMatDescr_t &cusparse_matrix, const DataIndexType num_rows, const DataIndexType num_columns, const DataIndexType nnz, DataType *RESTRICT device_A_data, DataIndexType *RESTRICT device_A_indices, DataIndexType *RESTRICT device_A_index_pointer)
int LongIndexType
Definition types.h:60

References CudaAPI< ArrayType >::alloc(), CudaAPI< ArrayType >::copy_to_device(), cusparse_api::create_cusparse_csc_matrix(), cusparse_api::create_cusparse_csr_matrix(), omp_get_thread_num(), omp_set_num_threads(), and CudaAPI< ArrayType >::set_device().

Referenced by cuCSCMatrix< DataType >::cuCSCMatrix().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ dot()

template<typename DataType >
void cuCSCMatrix< DataType >::dot ( const DataType *  device_vector,
DataType *  device_product 
)
virtual

Matrix vector product.

Performs the matrix vector product \( \boldsymbol{y} = \mathbf{A} \boldsymbol{x} \).

Parameters
[in]device_vectorA one-dimensional input vector \( \boldsymbol{x} \) with size the of the number of columns of the matrix \( \mathbf{A} \). This array should be on GPU device.
[out]device_productA one-dimensional output vector \( \boldsymbol{y} \) with the size of the number of rows of \( \mathbf{A} \). This vector will be overwritten. This array should be on GPU device.
See also
cuCSCMatrix::dot_plus, cuCSCMatrix::transposed_dot cuCSCMatrix::transposed_dot_plus

Implements cuLinearOperator< DataType >.

Definition at line 478 of file cu_csc_matrix.cu.

481{
482 assert(this->copied_host_to_device);
483
484 // Create cusparse vector for the input vector
485 cusparseDnVecDescr_t cusparse_input_vector;
487 cusparse_input_vector, this->num_columns,
488 const_cast<DataType*>(device_vector));
489
490 // Create cusparse vector for the output vector
491 cusparseDnVecDescr_t cusparse_output_vector;
493 cusparse_output_vector, this->num_rows, device_product);
494
495 // Matrix vector settings
496 DataType alpha = cu_arithmetics::cast<float, DataType>(1.0f);
497 DataType beta = cu_arithmetics::cast<float, DataType>(0.0f);
498
499 #ifndef CUDA_VERSION
500 #error CUDA_VERSION Undefined!
501 #elif CUDA_VERSION < 12000
502 // Using transpose operation since we treat CSC matrix as CSR
503 cusparseOperation_t cusparse_operation = CUSPARSE_OPERATION_TRANSPOSE;
504 #else
505 cusparseOperation_t cusparse_operation = \
506 CUSPARSE_OPERATION_NON_TRANSPOSE;
507 #endif
508
509 cusparseSpMVAlg_t algorithm = CUSPARSE_SPMV_ALG_DEFAULT;
510
511 // Get device id
512 int device_id = CudaAPI<DataType>::get_device();
513
514 // Allocate device buffer (or reallocation if needed)
515 this->allocate_buffer(device_id, cusparse_operation, alpha, beta,
516 cusparse_input_vector, cusparse_output_vector,
517 algorithm);
518
519 // Matrix vector multiplication
521 this->cusparse_handle[device_id], cusparse_operation, alpha,
522 this->cusparse_matrix_A[device_id], cusparse_input_vector, beta,
523 cusparse_output_vector, algorithm, this->device_buffer[device_id]);
524
525 // Destroy cusparse vectors
526 cusparse_api::destroy_cusparse_vector(cusparse_input_vector);
527 cusparse_api::destroy_cusparse_vector(cusparse_output_vector);
528}
static int get_device()
Gets the current device in multi-gpu applications.
Definition cuda_api.cu:209
void allocate_buffer(const int device_id, cusparseOperation_t cusparse_operation, const DataType alpha, const DataType beta, cusparseDnVecDescr_t &cusparse_input_vector, cusparseDnVecDescr_t &cusparse_output_vector, cusparseSpMVAlg_t algorithm)
Allocates an external buffer for matrix-vector multiplication using cusparseSpMV function.
#define CUSPARSE_SPMV_ALG_DEFAULT
__host__ __device__ DataType abs(const DataType x)
Absolute value of a floating point number.
void cusparse_matvec(cusparseHandle_t cusparse_handle, cusparseOperation_t cusparse_operation, const DataType alpha, cusparseSpMatDescr_t cusparse_matrix, cusparseDnVecDescr_t cusparse_input_vector, const DataType beta, cusparseDnVecDescr_t cusparse_output_vector, cusparseSpMVAlg_t algorithm, void *external_buffer)
void create_cusparse_vector(cusparseDnVecDescr_t &cusparse_vector, const LongIndexType vector_size, DataType *RESTRICT device_vector)
void destroy_cusparse_vector(cusparseDnVecDescr_t &cusparse_vector)
Destroys cusparse vector.

References cu_arithmetics::abs(), cusparse_api::create_cusparse_vector(), cusparse_api::cusparse_matvec(), CUSPARSE_SPMV_ALG_DEFAULT, cusparse_api::destroy_cusparse_vector(), and CudaAPI< ArrayType >::get_device().

Here is the call graph for this function:

◆ dot_plus()

template<typename DataType >
void cuCSCMatrix< DataType >::dot_plus ( const DataType *  device_vector,
const DataType  alpha,
DataType *  device_product 
)
virtual

Matrix vector product written in place.

Performs the matrix vector product \( \boldsymbol{y} = \boldsymbol{y} + \alpha \mathbf{A} \boldsymbol{x} \).

Parameters
[in]device_vectorA one-dimensional input vector \( \boldsymbol{x} \) with size the of the number of columns of the matrix \( \mathbf{A} \). This array should be on GPU device.
[in]alphaA scalar.
[out]device_productA one-dimensional output vector \( \boldsymbol{y} \) with the size of the number of rows of \( \mathbf{A} \). This array should be on GPU device.
See also
cuCSCMatrix::dot, cuCSCMatrix::transposed_dot cuCSCMatrix::transposed_dot_plus

Implements cuMatrix< DataType >.

Definition at line 556 of file cu_csc_matrix.cu.

560{
561 assert(this->copied_host_to_device);
562
563 // Create cusparse vector for the input vector
564 cusparseDnVecDescr_t cusparse_input_vector;
566 cusparse_input_vector, this->num_columns,
567 const_cast<DataType*>(device_vector));
568
569 // Create cusparse vector for the output vector
570 cusparseDnVecDescr_t cusparse_output_vector;
572 cusparse_output_vector, this->num_rows, device_product);
573
574 // Matrix vector settings
575 DataType beta = cu_arithmetics::cast<float, DataType>(1.0f);
576
577 #ifndef CUDA_VERSION
578 #error CUDA_VERSION Undefined!
579 #elif CUDA_VERSION < 12000
580 // Using transpose operation since we treat CSC matrix as CSR
581 cusparseOperation_t cusparse_operation = CUSPARSE_OPERATION_TRANSPOSE;
582 #else
583 cusparseOperation_t cusparse_operation = \
584 CUSPARSE_OPERATION_NON_TRANSPOSE;
585 #endif
586
587 cusparseSpMVAlg_t algorithm = CUSPARSE_SPMV_ALG_DEFAULT;
588
589 // Get device id
590 int device_id = CudaAPI<DataType>::get_device();
591
592 // Allocate device buffer (or reallocation if needed)
593 this->allocate_buffer(device_id, cusparse_operation, alpha, beta,
594 cusparse_input_vector, cusparse_output_vector,
595 algorithm);
596
597 // Matrix vector multiplication
599 this->cusparse_handle[device_id], cusparse_operation, alpha,
600 this->cusparse_matrix_A[device_id], cusparse_input_vector, beta,
601 cusparse_output_vector, algorithm, this->device_buffer[device_id]);
602
603 // Destroy cusparse vectors
604 cusparse_api::destroy_cusparse_vector(cusparse_input_vector);
605 cusparse_api::destroy_cusparse_vector(cusparse_output_vector);
606}

References cu_arithmetics::abs(), cusparse_api::create_cusparse_vector(), cusparse_api::cusparse_matvec(), CUSPARSE_SPMV_ALG_DEFAULT, cusparse_api::destroy_cusparse_vector(), and CudaAPI< ArrayType >::get_device().

Here is the call graph for this function:

◆ get_nnz()

template<typename DataType >
LongIndexType cuCSCMatrix< DataType >::get_nnz ( ) const

Returns the number of non-zero elements of the sparse matrix.

The nnz of a CSC matrix can be obtained from the last element of A_index_pointer. The size of array A_index_pointer is one plus the number of columns of the matrix.

Returns
The nnz of the matrix.

Definition at line 449 of file cu_csc_matrix.cu.

450{
451 return this->A_index_pointer[this->num_columns];
452}

◆ is_identity_matrix()

template<typename DataType >
FlagType cuCSCMatrix< DataType >::is_identity_matrix ( ) const
virtual

Checks whether the matrix is identity.

The identity check is primarily performed in the cAffineMatrixFunction class.

Returns
Returns 1 if the input matrix is identity, and 0 otherwise.
See also
cAffineMatrixFunction

Implements cuMatrix< DataType >.

Definition at line 381 of file cu_csc_matrix.cu.

382{
383 FlagType matrix_is_identity = 1;
384 LongIndexType index_pointer;
385 LongIndexType row;
386 DataType matrix_element;
387 const DataType diagonal = 1.0;
388 const DataType off_diagonal = 0.0;
389
390 // Check matrix element-wise
391 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
392 #pragma omp parallel for \
393 schedule(static) \
394 if (!omp_in_parallel()) \
395 default(none) \
396 shared(matrix_is_identity, diagonal, off_diagonal) \
397 private(index_pointer, row, matrix_element)
398 #endif
399 for (LongIndexType column=0; column < this->num_columns; ++column)
400 {
401 if (matrix_is_identity)
402 {
403 for (index_pointer=this->A_index_pointer[column];
404 index_pointer < this->A_index_pointer[column+1];
405 ++index_pointer)
406 {
407 row = this->A_indices[index_pointer];
408
409 if (!((this->A_is_symmetric) && (column >= row)))
410 {
411 matrix_element = this->A_data[index_pointer];
412
413 if (((row == column) && \
414 (!cu_arithmetics::is_equal(matrix_element,
415 diagonal))) || \
416 ((row != column) && \
417 (!cu_arithmetics::is_equal(matrix_element,
418 off_diagonal))))
419 {
420 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
421 #pragma omp atomic write
422 #endif
423 matrix_is_identity = 0;
424
425 break;
426 }
427 }
428 }
429 }
430 }
431
432 return matrix_is_identity;
433}
FlagType A_is_symmetric
Definition cu_matrix.h:79
bool is_equal(DataType x, DataType y)
Check if two floating point numbers are equal within a tolerance.
int FlagType
Definition types.h:68

References cu_arithmetics::is_equal().

Here is the call graph for this function:

◆ transpose_dot()

template<typename DataType >
void cuCSCMatrix< DataType >::transpose_dot ( const DataType *  device_vector,
DataType *  device_product 
)
virtual

Transposed-matrix vector product.

Performs the matrix vector product \( \boldsymbol{y} = \mathbf{A}^{\intercal} \boldsymbol{x} \).

Parameters
[in]device_vectorA one-dimensional input vector \( \boldsymbol{x} \) with size the of the number of columns of the matrix \( \mathbf{A} \). This array should be on GPU device.
[out]device_productA one-dimensional output vector \( \boldsymbol{y} \) with the size of the number of rows of \( \mathbf{A} \). This vector will be overwritten. This array should be on GPU device.
See also
cuCSCMatrix::dot_plus, cuCSCMatrix::dot cuCSCMatrix::transposed_dot_plus

Implements cuLinearOperator< DataType >.

Definition at line 632 of file cu_csc_matrix.cu.

635{
636 assert(this->copied_host_to_device);
637
638 // Create cusparse vector for the input vector
639 cusparseDnVecDescr_t cusparse_input_vector;
641 cusparse_input_vector, this->num_columns,
642 const_cast<DataType*>(device_vector));
643
644 // Create cusparse vector for the output vector
645 cusparseDnVecDescr_t cusparse_output_vector;
647 cusparse_output_vector, this->num_rows, device_product);
648
649 // Matrix vector settings
650 DataType alpha = cu_arithmetics::cast<float, DataType>(1.0f);
651 DataType beta = cu_arithmetics::cast<float, DataType>(0.0f);
652
653 #ifndef CUDA_VERSION
654 #error CUDA_VERSION Undefined!
655 #elif CUDA_VERSION < 12000
656 // Using non-transpose operation since we treat CSC matrix as CSR
657 cusparseOperation_t cusparse_operation = \
658 CUSPARSE_OPERATION_NON_TRANSPOSE;
659 #else
660 cusparseOperation_t cusparse_operation = CUSPARSE_OPERATION_TRANSPOSE;
661 #endif
662
663 cusparseSpMVAlg_t algorithm = CUSPARSE_SPMV_ALG_DEFAULT;
664
665 // Get device id
666 int device_id = CudaAPI<DataType>::get_device();
667
668 // Allocate device buffer (or reallocation if needed)
669 this->allocate_buffer(device_id, cusparse_operation, alpha, beta,
670 cusparse_input_vector, cusparse_output_vector,
671 algorithm);
672
673 // Matrix vector multiplication
675 this->cusparse_handle[device_id], cusparse_operation, alpha,
676 this->cusparse_matrix_A[device_id], cusparse_input_vector, beta,
677 cusparse_output_vector, algorithm, this->device_buffer[device_id]);
678
679 // Destroy cusparse vectors
680 cusparse_api::destroy_cusparse_vector(cusparse_input_vector);
681 cusparse_api::destroy_cusparse_vector(cusparse_output_vector);
682}

References cu_arithmetics::abs(), cusparse_api::create_cusparse_vector(), cusparse_api::cusparse_matvec(), CUSPARSE_SPMV_ALG_DEFAULT, cusparse_api::destroy_cusparse_vector(), and CudaAPI< ArrayType >::get_device().

Here is the call graph for this function:

◆ transpose_dot_plus()

template<typename DataType >
void cuCSCMatrix< DataType >::transpose_dot_plus ( const DataType *  device_vector,
const DataType  alpha,
DataType *  device_product 
)
virtual

Transposed-matrix vector product written in place.

Performs the matrix vector product \( \boldsymbol{y} = \boldsymbol{y} + \alpha \mathbf{A}^{\intercal} \boldsymbol{x} \).

Parameters
[in]device_vectorA one-dimensional input vector \( \boldsymbol{x} \) with size the of the number of columns of the matrix \( \mathbf{A} \). This array should be on GPU device.
[in]alphaA scalar.
[out]device_productA one-dimensional output vector \( \boldsymbol{y} \) with the size of the number of rows of \( \mathbf{A} \). This array should be on GPU device.
See also
cuCSCMatrix::dot_plus, cuCSCMatrix::transposed_dot cuCSCMatrix::dot

Implements cuMatrix< DataType >.

Definition at line 711 of file cu_csc_matrix.cu.

715{
716 assert(this->copied_host_to_device);
717
718 // Create cusparse vector for the input vector
719 cusparseDnVecDescr_t cusparse_input_vector;
721 cusparse_input_vector, this->num_columns,
722 const_cast<DataType*>(device_vector));
723
724 // Create cusparse vector for the output vector
725 cusparseDnVecDescr_t cusparse_output_vector;
727 cusparse_output_vector, this->num_rows, device_product);
728
729 // Matrix vector settings
730 DataType beta = cu_arithmetics::cast<float, DataType>(1.0f);
731
732 #ifndef CUDA_VERSION
733 #error CUDA_VERSION Undefined!
734 #elif CUDA_VERSION < 12000
735 // Using non-transpose operation since we treat CSC matrix as CSR
736 cusparseOperation_t cusparse_operation = \
737 CUSPARSE_OPERATION_NON_TRANSPOSE;
738 #else
739 cusparseOperation_t cusparse_operation = CUSPARSE_OPERATION_TRANSPOSE;
740 #endif
741
742 cusparseSpMVAlg_t algorithm = CUSPARSE_SPMV_ALG_DEFAULT;
743
744 // Get device id
745 int device_id = CudaAPI<DataType>::get_device();
746
747 // Allocate device buffer (or reallocation if needed)
748 this->allocate_buffer(device_id, cusparse_operation, alpha, beta,
749 cusparse_input_vector, cusparse_output_vector,
750 algorithm);
751
752 // Matrix vector multiplication
754 this->cusparse_handle[device_id], cusparse_operation, alpha,
755 this->cusparse_matrix_A[device_id], cusparse_input_vector, beta,
756 cusparse_output_vector, algorithm, this->device_buffer[device_id]);
757
758 // Destroy cusparse vectors
759 cusparse_api::destroy_cusparse_vector(cusparse_input_vector);
760 cusparse_api::destroy_cusparse_vector(cusparse_output_vector);
761}

References cu_arithmetics::abs(), cusparse_api::create_cusparse_vector(), cusparse_api::cusparse_matvec(), CUSPARSE_SPMV_ALG_DEFAULT, cusparse_api::destroy_cusparse_vector(), and CudaAPI< ArrayType >::get_device().

Here is the call graph for this function:

Member Data Documentation

◆ A_data

template<typename DataType >
const DataType* cuCSCMatrix< DataType >::A_data
protected

Definition at line 99 of file cu_csc_matrix.h.

◆ A_index_pointer

template<typename DataType >
const LongIndexType* cuCSCMatrix< DataType >::A_index_pointer
protected

Definition at line 101 of file cu_csc_matrix.h.

◆ A_indices

template<typename DataType >
const LongIndexType* cuCSCMatrix< DataType >::A_indices
protected

Definition at line 100 of file cu_csc_matrix.h.

◆ cusparse_matrix_A

template<typename DataType >
cusparseSpMatDescr_t* cuCSCMatrix< DataType >::cusparse_matrix_A
protected

Definition at line 107 of file cu_csc_matrix.h.

◆ device_A_data

template<typename DataType >
DataType** cuCSCMatrix< DataType >::device_A_data
protected

Definition at line 102 of file cu_csc_matrix.h.

◆ device_A_index_pointer

template<typename DataType >
LongIndexType** cuCSCMatrix< DataType >::device_A_index_pointer
protected

Definition at line 104 of file cu_csc_matrix.h.

◆ device_A_indices

template<typename DataType >
LongIndexType** cuCSCMatrix< DataType >::device_A_indices
protected

Definition at line 103 of file cu_csc_matrix.h.

◆ device_buffer

template<typename DataType >
void** cuCSCMatrix< DataType >::device_buffer
protected

Definition at line 105 of file cu_csc_matrix.h.

Referenced by cuCSCMatrix< DataType >::cuCSCMatrix().

◆ device_buffer_num_bytes

template<typename DataType >
size_t* cuCSCMatrix< DataType >::device_buffer_num_bytes
protected

Definition at line 106 of file cu_csc_matrix.h.

Referenced by cuCSCMatrix< DataType >::cuCSCMatrix().


The documentation for this class was generated from the following files: