imate
C++/CUDA Reference
cuOrthogonalization< DataType > Class Template Reference

A static class for orthogonalization of vector bases. This class acts as a templated namespace, where all member methods are public and static. More...

#include <cu_orthogonalization.h>

Static Public Member Functions

static void gram_schmidt_process (cublasHandle_t cublas_handle, const DataType *V, const LongIndexType vector_size, const IndexType num_vectors, const IndexType last_vector, const FlagType num_ortho, DataType *r)
 Modified Gram-Schmidt orthogonalization process to orthogonalize the vector v against a subset of the column vectors in the array V. More...
 
static void orthogonalize_vectors (cublasHandle_t cublas_handle, DataType *vectors, const LongIndexType vector_size, const IndexType num_vectors)
 Orthogonalizes set of vectors mutually using modified Gram-Schmidt process. More...
 

Detailed Description

template<typename DataType>
class cuOrthogonalization< DataType >

A static class for orthogonalization of vector bases. This class acts as a templated namespace, where all member methods are public and static.

See also
RandomVectors

Definition at line 36 of file cu_orthogonalization.h.

Member Function Documentation

◆ gram_schmidt_process()

template<typename DataType >
void cuOrthogonalization< DataType >::gram_schmidt_process ( cublasHandle_t  cublas_handle,
const DataType *  V,
const LongIndexType  vector_size,
const IndexType  num_vectors,
const IndexType  last_vector,
const FlagType  num_ortho,
DataType *  v 
)
static

Modified Gram-Schmidt orthogonalization process to orthogonalize the vector v against a subset of the column vectors in the array V.

V is 1D array of the length vector_size*num_vectors to represent a 2D array of a set of num_vectors column vectors, each of the length vector_size. The length of v is also vector_size.

v is orthogonalized against the last num_ortho columns of V starting from the column vector of the index last_vector. If the backward indexing from last_vector becomes a negative index, the index wraps around from the last column vector index, i.e., num_vectors-1 .

  • If num_ortho is zero, or if num_vectors is zero, no orthogonalization is performed.
  • If num_ortho is negative (usually set to -1), then v is orthogonalized against all column vectors of V.
  • If num_ortho is larger than num_vectors, then v is orthogonalized against all column vectors of V.
  • If num_ortho is smaller than num_vectors, then v is orthogonalized against the last num_ortho column vectors of V, starting from the column vector with the index last_vector toward its previous vectors. If the iteration runs into negativen column indices, the column indexing wraps around from the end of the columns from num_vectors-1.

The result of the newer v is written in-place in v.

If vector v is identical to one of the vectors in V, the orthogonalization against the identical vector is skipped.

If one of the column vectors of V is zero (have zero norm), that vector is ignored.

Note
It is assumed that the caller function fills the column vectors of V periodically in a wrapped around order from column index 0,1,... to num_vectors-1, and newer vectors are replaced on the wrapped index starting from index 0,1,... again. Thus, V only stores the last num_vectors column vectors. The index of the last filled vector is indicated by last_vector.
Warning
The vector v can be indeed one of the columns of V itself. However, in this case, vector v must NOT be orthogonalized against itself, rather, it should only be orthogonalized to the other vectors in V. For instance, if num_vectors=10, and v is the 3rd vector of V, and if num_ortho is 6, then we may set last_vector=2. Then v is orthogonalized againts the six columns 2,1,0,9,8,7, where the last three of them are wrapped around the end of the columns.
See also
cu_golub_kahn_bidiagonalizaton, cu_lanczos_bidiagonalization
Parameters
[in]cublas_handleThe cuBLAS object handle.
[in]V1D coalesced array of vectors representing a 2D array. The length of this 1D array is vector_size*num_vectors, which indicates a 2D array with the shape (vector_size,num_vectors).
[in]vector_sizeThe length of each vector. If we assume V indicates a 2D vector, this is the number of rows of V.
[in]num_vectorsThe number of column vectors. If we assume V indicates a 2D vector, this the number of columns of V.
[in]last_vectorThe column vectors of the array V are rewritten by the caller function in wrapped-around order. That is, once all the columns (from the zeroth to the num_vector-1 vector) are filled, the next vector is rewritten in the place of the zeroth vector, and the indices of newer vectors wrap around the columns of V. Thus, V only retains the last num_vectors vectors. The column index of the last written vector is given by last_vector. This index is a number between 0 and num_vectors-1. The index of the last i-th vector is winding back from the last vector by last_vector-i+1 mod num_vectors.
[in]num_orthoThe number of vectors to be orthogonalized starting from the last vector. 0 indicates no orthogonalization will be performed and the function just returns. A negative value means all vectors will be orthogonalized. A poisitive value will orthogonalize the given number of vectors. This value cannot be larger than the number of vectors.
[in,out]vThe vector that will be orthogonalized against the columns of V. The length of v is vector_size. This vector is modified in-place.

Definition at line 128 of file cu_orthogonalization.cu.

136 {
137  // Determine how many previous vectors to orthogonalize against
138  IndexType num_steps;
139  if ((num_ortho == 0) || (num_vectors < 2))
140  {
141  // No orthogonalization is performed
142  return;
143  }
144  else if ((num_ortho < 0) ||
145  (num_ortho > static_cast<FlagType>(num_vectors)))
146  {
147  // Orthogonalize against all vectors
148  num_steps = num_vectors;
149  }
150  else
151  {
152  // Orthogonalize against only the last num_ortho vectors
153  num_steps = num_ortho;
154  }
155 
156  // Vectors can be orthogonalized at most to the full basis of the vector
157  // space. Thus, num_steps cannot be larger than the dimension of vector
158  // space, which is vector_size.
159  if (num_steps > static_cast<IndexType>(vector_size))
160  {
161  num_steps = vector_size;
162  }
163 
164  IndexType i;
165  DataType inner_prod;
166  DataType norm;
167  DataType norm_v;
168  DataType epsilon = std::numeric_limits<DataType>::epsilon();
169  DataType distance2;
170 
171  // Iterate over vectors
172  for (IndexType step=0; step < num_steps; ++step)
173  {
174  // i is the index of a column vector in V to orthogonalize v against it
175  if ((last_vector % num_vectors) >= step)
176  {
177  i = (last_vector % num_vectors) - step;
178  }
179  else
180  {
181  // Wrap around negative indices from the end of column index
182  i = (last_vector % num_vectors) - step + num_vectors;
183  }
184 
185  // Norm of j-th vector
187  cublas_handle, &V[vector_size*i], vector_size);
188 
189  // Check norm
190  if (norm < epsilon * sqrt(vector_size))
191  {
192  std::cerr << "WARNING: norm of the given vector is too small. " \
193  << "Cannot orthogonalize against zero vector. " \
194  << "Skipping." << std::endl;
195  continue;
196  }
197 
198  // Projection
200  cublas_handle, &V[vector_size*i], v, vector_size);
201 
202  // scale for subtraction
203  DataType scale = inner_prod / (norm * norm);
204 
205  // If scale is is 1, it is possible that vector v and j-th vector are
206  // identical (or close).
207  if (std::abs(scale - 1.0) <= 2.0 * epsilon)
208  {
209  // Norm of the vector v
211  cublas_handle, v, vector_size);
212 
213  // Compute distance between the j-th vector and vector v
214  distance2 = norm_v*norm_v - 2.0*inner_prod + norm*norm;
215 
216  // If distance is zero, do not reorthogonalize i-th against
217  // the j-th vector.
218  if (distance2 < 2.0 * epsilon * vector_size)
219  {
220  continue;
221  }
222  }
223 
224  // Subtraction
226  cublas_handle, &V[vector_size*i], vector_size, scale, v);
227  }
228 }
static void subtract_scaled_vector(cublasHandle_t cublas_handle, const DataType *input_vector, const LongIndexType vector_size, const DataType scale, DataType *output_vector)
Subtracts the scaled input vector from the output vector.
static DataType inner_product(cublasHandle_t cublas_handle, const DataType *vector1, const DataType *vector2, const LongIndexType vector_size)
Computes Euclidean inner product of two vectors.
static DataType euclidean_norm(cublasHandle_t cublas_handle, const DataType *vector, const LongIndexType vector_size)
Computes the Euclidean 2-norm of a 1D array.
int FlagType
Definition: types.h:68
int IndexType
Definition: types.h:65

References cuVectorOperations< DataType >::euclidean_norm(), cuVectorOperations< DataType >::inner_product(), and cuVectorOperations< DataType >::subtract_scaled_vector().

Referenced by cu_golub_kahn_bidiagonalization(), and cu_lanczos_tridiagonalization().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ orthogonalize_vectors()

template<typename DataType >
void cuOrthogonalization< DataType >::orthogonalize_vectors ( cublasHandle_t  cublas_handle,
DataType *  vectors,
const LongIndexType  vector_size,
const IndexType  num_vectors 
)
static

Orthogonalizes set of vectors mutually using modified Gram-Schmidt process.

Note
Let m be the number of vectors (num_vectors), and let n be the size of each vector (vector_size). In general, n is much larger (large matrix size), and m is small, in order of a couple of hundred. But for small matrices (where n could be smaller then m), then each vector can be orthogonalized at most to n other vectors. This is because the dimension of the vector space is n. Thus, if there are extra vectors, each vector is orthogonalized to window of the previous n vector.

If one of the column vectors is identical to one of other column vectors in V, one of the vectors is regenerated by random array and the orthogonalization is repeated.

Note
If two vectors are identical (or the norm of their difference is very small), they cannot be orthogonalized against each other. In this case, one of the vectors is re-generated by new random numbers.
Warning
if num_vectors is larger than vector_size, the orthogonalization fails since not all vectors are independent, and at least one vector becomes zero.
Parameters
[in]cublas_handleThe cuBLAS object handle.
[in,out]vectors2D array of size vector_size*num_vectors. This array will be modified in-place and will be output of this function. Note that this is Fortran ordering, meaning that the first index is contiguous. Hence, to call the j-th element of the i-th vector, use &vectors[i*vector_size + j].
[in]num_vectorsNumber of columns of vectors array.
[in]vector_sizeNumber of rows of vectors array.

Definition at line 275 of file cu_orthogonalization.cu.

280 {
281  // Do nothing if there is only one vector
282  if (num_vectors < 2)
283  {
284  return;
285  }
286 
287  IndexType i = 0;
288  IndexType j;
289  IndexType start_j;
290  DataType inner_prod;
291  DataType norm_j;
292  DataType norm_i;
293  DataType epsilon = std::numeric_limits<DataType>::epsilon();
294  IndexType success = 1;
295  IndexType max_num_trials = 20;
296  IndexType num_trials = 0;
297  IndexType num_threads = 1;
298  RandomNumberGenerator random_number_generator(num_threads);
299  DataType* buffer = NULL;
300 
301  while (i < num_vectors)
302  {
303  if ((success == 0) && (num_trials >= max_num_trials))
304  {
305  std::cerr << "ERROR: Cannot orthogonalize vectors after " \
306  << num_trials << " trials. Aborting." \
307  << std::endl;
308  abort();
309  }
310 
311  // Reset on new trial (if it was set to 0 before to start a new trial)
312  success = 1;
313 
314  // j iterates on previous vectors in a window of at most vector_size
315  if (static_cast<LongIndexType>(i) > vector_size)
316  {
317  // When vector_size is smaller than i, it is fine to cast to signed
318  start_j = i - static_cast<IndexType>(vector_size);
319  }
320  else
321  {
322  start_j = 0;
323  }
324 
325  // Reorthogonalize against previous vectors
326  for (j=start_j; j < i; ++j)
327  {
328  // Norm of the j-th vector
330  cublas_handle, &vectors[j*vector_size], vector_size);
331 
332  // Check norm
333  if (norm_j < epsilon * sqrt(vector_size))
334  {
335  std::cerr << "WARNING: norm of the given vector is too " \
336  << " small. Cannot reorthogonalize against zero" \
337  << "vector. Skipping."
338  << std::endl;
339  continue;
340  }
341 
342  // Projecting i-th vector to j-th vector
344  cublas_handle, &vectors[i*vector_size],
345  &vectors[j*vector_size], vector_size);
346 
347  // Scale of subtraction
348  DataType scale = inner_prod / (norm_j * norm_j);
349 
350  // Subtraction
352  cublas_handle, &vectors[vector_size*j], vector_size, scale,
353  &vectors[vector_size*i]);
354 
355  // Norm of the i-th vector
357  cublas_handle, &vectors[i*vector_size], vector_size);
358 
359  // If the norm is too small, regenerate the i-th vector randomly
360  if (norm_i < epsilon * sqrt(vector_size))
361  {
362  // Allocate buffer
363  if (buffer == NULL)
364  {
365  buffer = new DataType[vector_size];
366  }
367 
368  // Regenerate new random vector on buffer
370  random_number_generator, buffer,
371  vector_size, num_threads);
372 
373  // Copy buffer to the i-th vector on device
375  buffer, vector_size, &vectors[i*vector_size]);
376 
377  // Repeat the reorthogonalization for i-th vector against
378  // all previous vectors again.
379  success = 0;
380  ++num_trials;
381  break;
382  }
383  }
384 
385  if (success == 1)
386  {
387  ++i;
388 
389  // Reset if num_trials was incremented before.
390  num_trials = 0;
391  }
392  }
393 
394  // Deallocate buffer
395  if (buffer != NULL)
396  {
397  delete[] buffer;
398  buffer = NULL;
399  }
400 }
static void copy_to_device(const ArrayType *host_array, const LongIndexType array_size, ArrayType *device_array)
Copies memory on host to device memory.
static void generate_random_array(RandomNumberGenerator &random_number_generator, DataType *array, const LongIndexType array_size, const IndexType num_threads)
Generates a pseudo-random array with Rademacher distribution where elements are either +1 or -1.
Generates 64-bit integers on multiple parallel threads.
int LongIndexType
Definition: types.h:60

References CudaInterface< ArrayType >::copy_to_device(), cuVectorOperations< DataType >::euclidean_norm(), RandomArrayGenerator< DataType >::generate_random_array(), cuVectorOperations< DataType >::inner_product(), and cuVectorOperations< DataType >::subtract_scaled_vector().

Here is the call graph for this function:

The documentation for this class was generated from the following files: