imate
C++/CUDA Reference
Loading...
Searching...
No Matches
cVectorOperations< DataType > Class Template Reference

A static class for vector operations, similar to level-1 operations of the BLAS library. This class acts as a templated namespace, where all member methods are public and static. More...

#include <c_vector_operations.h>

Static Public Member Functions

static void copy_vector (const DataType *RESTRICT input_vector, const LongIndexType vector_size, DataType *RESTRICT output_vector)
 Copies a vector to a new vector. Result is written in-place.
 
static void copy_scaled_vector (const DataType *RESTRICT input_vector, const LongIndexType vector_size, const DataType scale, DataType *RESTRICT output_vector)
 Scales a vector and stores to a new vector.
 
static void subtract_scaled_vector (const DataType *RESTRICT input_vector, const LongIndexType vector_size, const DataType scale, DataType *RESTRICT output_vector)
 Subtracts the scaled input vector from the output vector.
 
static DataType inner_product (const DataType *RESTRICT vector1, const DataType *RESTRICT vector2, const LongIndexType vector_size)
 Computes Euclidean inner product of two vectors.
 
static DataType euclidean_norm (const DataType *RESTRICT vector, const LongIndexType vector_size)
 Computes the Euclidean norm of a 1D array.
 
static DataType normalize_vector_in_place (DataType *RESTRICT vector, const LongIndexType vector_size)
 Normalizes a vector based on Euclidean 2-norm. The result is written in-place.
 
static DataType normalize_vector_and_copy (const DataType *RESTRICT vector, const LongIndexType vector_size, DataType *RESTRICT output_vector)
 Normalizes a vector based on Euclidean 2-norm. The result is written into another vector.
 

Detailed Description

template<typename DataType>
class cVectorOperations< DataType >

A static class for vector operations, similar to level-1 operations of the BLAS library. This class acts as a templated namespace, where all member methods are public and static.

See also
MatrixOperations

Definition at line 46 of file c_vector_operations.h.

Member Function Documentation

◆ copy_scaled_vector()

template<typename DataType >
void cVectorOperations< DataType >::copy_scaled_vector ( const DataType *RESTRICT  input_vector,
const LongIndexType  vector_size,
const DataType  scale,
DataType *RESTRICT  output_vector 
)
static

Scales a vector and stores to a new vector.

Parameters
[in]input_vectorA 1D array
[in]vector_sizeLength of vector array
[in]scaleScale coefficient to the input vector. If this is equal to one, the function effectively becomes the same as copy_vector.
[out]output_vectorOutput vector (written in place).

Definition at line 93 of file c_vector_operations.cpp.

98{
99 #if defined(USE_ANY_CBLAS) && (USE_ANY_CBLAS == 1)
100
101 // Using BLAS
102 int incx = 1;
103 int incy = 1;
104
105 cblas_api::xcopy(vector_size, input_vector, incx, output_vector, incy);
106
107 cblas_api::xscal(vector_size, scale, output_vector, incy);
108
109 #else
110
111 // Not using BLAS
112 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
113 #pragma omp parallel for \
114 schedule(static) \
115 if ((!omp_in_parallel()) && (vector_size >= LARGE_ARRAY_SIZE)) \
116 default(none) \
117 shared(input_vector, output_vector, vector_size, scale)
118 #endif
119 for (LongIndexType i=0; i < vector_size; ++i)
120 {
121 output_vector[i] = scale * input_vector[i];
122 }
123
124 #endif
125}
int LongIndexType
Definition types.h:60

Referenced by c_lanczos_tridiagonalization(), and cVectorOperations< DataType >::normalize_vector_and_copy().

Here is the caller graph for this function:

◆ copy_vector()

template<typename DataType >
void cVectorOperations< DataType >::copy_vector ( const DataType *RESTRICT  input_vector,
const LongIndexType  vector_size,
DataType *RESTRICT  output_vector 
)
static

Copies a vector to a new vector. Result is written in-place.

Parameters
[in]input_vectorA 1D array
[in]vector_sizeLength of vector array
[out]output_vectorOutput vector (written in place).

Definition at line 45 of file c_vector_operations.cpp.

49{
50 #if defined(USE_ANY_CBLAS) && (USE_ANY_CBLAS == 1)
51
52 // Using BLAS
53 int incx = 1;
54 int incy = 1;
55
56 cblas_api::xcopy(vector_size, input_vector, incx, output_vector, incy);
57
58 #else
59
60 // Not using BLAS
61 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
62 #pragma omp parallel for \
63 schedule(static) \
64 if ((!omp_in_parallel()) && (vector_size >= LARGE_ARRAY_SIZE)) \
65 default(none) \
66 shared(input_vector, output_vector, vector_size)
67 #endif
68 for (LongIndexType i=0; i < vector_size; ++i)
69 {
70 output_vector[i] = input_vector[i];
71 }
72
73 #endif
74}

Referenced by c_lanczos_tridiagonalization().

Here is the caller graph for this function:

◆ euclidean_norm()

template<typename DataType >
DataType cVectorOperations< DataType >::euclidean_norm ( const DataType *RESTRICT  vector,
const LongIndexType  vector_size 
)
static

Computes the Euclidean norm of a 1D array.

The reduction variable (here, inner_prod ) is of the type long double. This is becase when DataType is float, the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Using a larger bit type for the reduction variable is very important for this function. If DataType is float, without such consideration, the result of estimation of trace can be completely wrong, just becase of the wrong norm results. For large array sizes, even libraries such as openblas does not compute the dot product accurately.

The chunk computation of the dot product (as seen in the code with chunk=5) improves the preformance with gaining twice speedup. This result is not much dependet on chunk. For example, chunk=10 also yields a similar result.

Parameters
[in]vectorA pointer to 1D array
[in]vector_sizeLength of the array
Returns
Euclidean norm

Definition at line 332 of file c_vector_operations.cpp.

335{
336 #if defined(USE_ANY_CBLAS) && (USE_ANY_CBLAS == 1)
337
338 // Using BLAS
339 int incx = 1;
340
341 DataType norm = cblas_api::xnrm2(vector_size, vector, incx);
342
343 return norm;
344
345 #else
346
347 // Compute norm squared
348 long double norm2 = 0.0;
349 #if defined(USE_LOOP_UNROLLING) && (USE_LOOP_UNROLLING == 1)
350 LongIndexType chunk = 5;
351 LongIndexType vector_size_chunked = vector_size - (vector_size % chunk);
352 #endif
353
354 #if defined(USE_LOOP_UNROLLING) && (USE_LOOP_UNROLLING == 1)
355 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
356 #pragma omp parallel for \
357 schedule(static) \
358 if ((!omp_in_parallel()) && \
359 (vector_size_chunked >= LARGE_ARRAY_SIZE)) \
360 default(none) \
361 shared(vector, vector_size, vector_size_chunked, chunk) \
362 reduction(+: norm2)
363 #endif
364 for (LongIndexType i=0; i < vector_size_chunked; i += chunk)
365 {
366 norm2 += vector[i] * vector[i] +
367 vector[i+1] * vector[i+1] +
368 vector[i+2] * vector[i+2] +
369 vector[i+3] * vector[i+3] +
370 vector[i+4] * vector[i+4];
371 }
372 #endif
373
374 #if defined(USE_LOOP_UNROLLING) && (USE_LOOP_UNROLLING == 1)
375 for (LongIndexType i=vector_size_chunked; i < vector_size; ++i)
376 #else
377 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
378 #pragma omp parallel for \
379 schedule(static) \
380 if ((!omp_in_parallel()) && (vector_size >= LARGE_ARRAY_SIZE)) \
381 default(none) \
382 shared(vector, vector_size) \
383 reduction(+: norm2)
384 #endif
385 for (LongIndexType i=0; i < vector_size; ++i)
386 #endif
387 {
388 norm2 += vector[i] * vector[i];
389 }
390
391 // Norm
392 DataType norm = std::sqrt(static_cast<DataType>(norm2));
393
394 return norm;
395
396 #endif
397}
#define USE_OPENMP
Definition definitions.h:84
#define LARGE_ARRAY_SIZE
Definition definitions.h:78

References LARGE_ARRAY_SIZE, and USE_OPENMP.

Referenced by c_lanczos_tridiagonalization(), cOrthogonalization< DataType >::gram_schmidt_process(), cVectorOperations< DataType >::normalize_vector_and_copy(), cVectorOperations< DataType >::normalize_vector_in_place(), and cOrthogonalization< DataType >::orthogonalize_vectors().

Here is the caller graph for this function:

◆ inner_product()

template<typename DataType >
DataType cVectorOperations< DataType >::inner_product ( const DataType *RESTRICT  vector1,
const DataType *RESTRICT  vector2,
const LongIndexType  vector_size 
)
static

Computes Euclidean inner product of two vectors.

The reduction variable (here, inner_prod ) is of the type long double. This is becase when DataType is float, the summation loses the precision, especially when the vector size is large. It seems that using long double is slightly faster than using double. The advantage of using a type with larger bits for the reduction variable is only sensible if the compiler is optimized with -O2 or -O3 flags.

Using a larger bit type for the reduction variable is very important for this function. If DataType is float, without such consideration, the result of estimation of trace can be completely wrong, just becase of the wrong inner product results. For large array sizes, even libraries such as openblas does not compute the dot product accurately.

The chunk computation of the dot product (as seen in the code with chunk=5) improves the preformance with gaining twice speedup. This result is not much dependet on chunk. For example, chunk=10 also yields a similar result.

Parameters
[in]vector11D array
[in]vector21D array
[in]vector_sizeLength of array
Returns
Inner product of two vectors.

Definition at line 230 of file c_vector_operations.cpp.

234{
235 #if defined(USE_ANY_CBLAS) && (USE_ANY_CBLAS == 1)
236
237 // Using BLAS
238 int incx = 1;
239 int incy = 1;
240
241 DataType inner_prod = cblas_api::xdot(vector_size, vector1, incx, vector2,
242 incy);
243
244 return inner_prod;
245
246 #else
247
248 // Not using BLAS
249 long double inner_prod = 0.0;
250 #if defined(USE_LOOP_UNROLLING) && (USE_LOOP_UNROLLING == 1)
251 LongIndexType chunk = 5;
252 LongIndexType vector_size_chunked = vector_size - (vector_size % chunk);
253 #endif
254
255 #if defined(USE_LOOP_UNROLLING) && (USE_LOOP_UNROLLING == 1)
256 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
257 #pragma omp parallel for \
258 schedule(static) \
259 if ((!omp_in_parallel()) && \
260 (vector_size_chunked >= LARGE_ARRAY_SIZE)) \
261 default(none) \
262 shared(vector1, vector2, vector_size, vector_size_chunked, chunk) \
263 reduction(+: inner_prod)
264 #endif
265 for (LongIndexType i=0; i < vector_size_chunked; i += chunk)
266 {
267 inner_prod += vector1[i] * vector2[i] +
268 vector1[i+1] * vector2[i+1] +
269 vector1[i+2] * vector2[i+2] +
270 vector1[i+3] * vector2[i+3] +
271 vector1[i+4] * vector2[i+4];
272 }
273 #endif
274
275 #if defined(USE_LOOP_UNROLLING) && (USE_LOOP_UNROLLING == 1)
276 for (LongIndexType i=vector_size_chunked; i < vector_size; ++i)
277 #else
278 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
279 #pragma omp parallel for \
280 schedule(static) \
281 if ((!omp_in_parallel()) && (vector_size >= LARGE_ARRAY_SIZE)) \
282 default(none) \
283 shared(vector1, vector2, vector_size) \
284 reduction(+: inner_prod)
285 #endif
286 for (LongIndexType i=0; i < vector_size; ++i)
287 #endif
288 {
289 inner_prod += vector1[i] * vector2[i];
290 }
291
292 return static_cast<DataType>(inner_prod);
293
294 #endif
295}

References LARGE_ARRAY_SIZE, and USE_OPENMP.

Referenced by c_lanczos_tridiagonalization(), cOrthogonalization< DataType >::gram_schmidt_process(), and cOrthogonalization< DataType >::orthogonalize_vectors().

Here is the caller graph for this function:

◆ normalize_vector_and_copy()

template<typename DataType >
DataType cVectorOperations< DataType >::normalize_vector_and_copy ( const DataType *RESTRICT  vector,
const LongIndexType  vector_size,
DataType *RESTRICT  output_vector 
)
static

Normalizes a vector based on Euclidean 2-norm. The result is written into another vector.

Parameters
[in]vectorInput vector.
[in]vector_sizeLength of the input vector
[out]output_vectorOutput vector, which is the normalization of the input vector.
Returns
2-norm of the input vector

Definition at line 472 of file c_vector_operations.cpp.

476{
477 #if defined(USE_ANY_CBLAS) && (USE_ANY_CBLAS == 1)
478
479 // Norm of vector
481 vector, vector_size);
482
483 // Normalize to output
484 DataType scale = 1.0 / norm;
485 cVectorOperations<DataType>::copy_scaled_vector(vector, vector_size, scale,
486 output_vector);
487
488 return norm;
489
490 #else
491
492 // Norm of vector
493 DataType norm = cVectorOperations<DataType>::euclidean_norm(vector,
494 vector_size);
495
496 // Normalize to output
497 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
498 #pragma omp parallel for \
499 schedule(static) \
500 if ((!omp_in_parallel()) && (vector_size >= LARGE_ARRAY_SIZE)) \
501 default(none) \
502 shared(vector, output_vector, vector_size, norm)
503 #endif
504 for (LongIndexType i=0; i < vector_size; ++i)
505 {
506 output_vector[i] = vector[i] / norm;
507 }
508
509 return norm;
510
511 #endif
512}
static void copy_scaled_vector(const DataType *RESTRICT input_vector, const LongIndexType vector_size, const DataType scale, DataType *RESTRICT output_vector)
Scales a vector and stores to a new vector.
static DataType euclidean_norm(const DataType *RESTRICT vector, const LongIndexType vector_size)
Computes the Euclidean norm of a 1D array.

References cVectorOperations< DataType >::copy_scaled_vector(), and cVectorOperations< DataType >::euclidean_norm().

Referenced by c_golub_kahn_bidiagonalization().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ normalize_vector_in_place()

template<typename DataType >
DataType cVectorOperations< DataType >::normalize_vector_in_place ( DataType *RESTRICT  vector,
const LongIndexType  vector_size 
)
static

Normalizes a vector based on Euclidean 2-norm. The result is written in-place.

Parameters
[in,out]vectorInput vector to be normalized in-place.
[in]vector_sizeLength of the input vector
Returns
2-Norm of the input vector (before normalization)

Definition at line 414 of file c_vector_operations.cpp.

417{
418 #if defined(USE_ANY_CBLAS) && (USE_ANY_CBLAS == 1)
419
420 // Norm of vector
422 vector, vector_size);
423
424 // Normalize in place
425 DataType scale = 1.0 / norm;
426 int incx = 1;
427 cblas_api::xscal(vector_size, scale, vector, incx);
428
429 return norm;
430
431 #else
432
433 // Norm of vector
434 DataType norm = cVectorOperations<DataType>::euclidean_norm(vector,
435 vector_size);
436
437 // Normalize in place
438 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
439 #pragma omp parallel for \
440 schedule(static) \
441 if ((!omp_in_parallel()) && (vector_size >= LARGE_ARRAY_SIZE)) \
442 default(none) \
443 shared(vector, vector_size, norm)
444 #endif
445 for (LongIndexType i=0; i < vector_size; ++i)
446 {
447 vector[i] /= norm;
448 }
449
450 return norm;
451
452 #endif
453}

References cVectorOperations< DataType >::euclidean_norm().

Referenced by c_golub_kahn_bidiagonalization().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ subtract_scaled_vector()

template<typename DataType >
void cVectorOperations< DataType >::subtract_scaled_vector ( const DataType *RESTRICT  input_vector,
const LongIndexType  vector_size,
const DataType  scale,
DataType *RESTRICT  output_vector 
)
static

Subtracts the scaled input vector from the output vector.

Performs the following operation:

\[ \boldsymbol{b} = \boldsymbol{b} - c \boldsymbol{a}, \]

where

  • \( \boldsymbol{a} \) is the input vector,
  • \( c \) is a scalar scale to the input vector, and
  • \( \boldsymbol{b} \) is the output vector that is written in-place.
Parameters
[in]input_vectorA 1D array
[in]vector_sizeLength of vector array
[in]scaleScale coefficient to the input vector.
[in,out]output_vectorOutput vector (written in place).

Definition at line 153 of file c_vector_operations.cpp.

158{
159
160 #if defined(USE_ANY_CBLAS) && (USE_ANY_CBLAS == 1)
161
162 // Using BLAS
163 int incx = 1;
164 int incy = 1;
165
166 DataType neg_scale = -scale;
167 cblas_api::xaxpy(vector_size, neg_scale, input_vector, incx, output_vector,
168 incy);
169
170 #else
171
172 // Not using BLAS
173 DataType zero = 0.0;
174 if (c_arithmetics::is_equal(scale, zero))
175 {
176 return;
177 }
178
179 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
180 #pragma omp parallel for \
181 schedule(static) \
182 if ((!omp_in_parallel()) && (vector_size >= LARGE_ARRAY_SIZE)) \
183 default(none) \
184 shared(input_vector, output_vector, vector_size, scale)
185 #endif
186 for (LongIndexType i=0; i < vector_size; ++i)
187 {
188 output_vector[i] -= scale * input_vector[i];
189 }
190
191 #endif
192}
bool is_equal(DataType x, DataType y)
Check if two floating point numbers are equal within a tolerance.
Definition _c_is_equal.h:49

References c_arithmetics::is_equal().

Referenced by cAffineMatrixFunction< DataType >::_add_scaled_vector(), c_golub_kahn_bidiagonalization(), c_lanczos_tridiagonalization(), cOrthogonalization< DataType >::gram_schmidt_process(), and cOrthogonalization< DataType >::orthogonalize_vectors().

Here is the call graph for this function:
Here is the caller graph for this function:

The documentation for this class was generated from the following files: