imate
C++/CUDA Reference
cu_golub_kahn_bidiagonalization.h File Reference
Include dependency graph for cu_golub_kahn_bidiagonalization.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Functions

template<typename DataType >
IndexType cu_golub_kahn_bidiagonalization (cuLinearOperator< DataType > *A, const DataType *v, const LongIndexType n, const IndexType m, const DataType lanczos_tol, const FlagType orthogonalize, DataType *alpha, DataType *beta)
 Bi-diagonalizes the positive-definite matrix A using Golub-Kahn-Lanczos method. More...
 

Function Documentation

◆ cu_golub_kahn_bidiagonalization()

template<typename DataType >
IndexType cu_golub_kahn_bidiagonalization ( cuLinearOperator< DataType > *  A,
const DataType *  v,
const LongIndexType  n,
const IndexType  m,
const DataType  lanczos_tol,
const FlagType  orthogonalize,
DataType *  alpha,
DataType *  beta 
)

Bi-diagonalizes the positive-definite matrix A using Golub-Kahn-Lanczos method.

This method bi-diagonalizes matrix A to B using the start vector w. m is the Lanczos degree, which will be the size of square matrix B.

The output of this function are alpha (of length m) and beta (of length m+1) which are diagonal (alpha[:]) and off-diagonal (beta[1:]) elements of the bi-diagonal (m,m) symmetric and positive-definite matrix B.

Lanczos tridiagonalization vs Golub-Kahn Bidiagonalization

  • The Lanczos tri-diagonalization is twice faster (in runtime), as it has only one matrix-vector multiplication. Whereas the Golub-Kahn bi-diagonalization has two matrix-vector multiplications.
  • The Lanczos tri-diagonalization can only be applied to symmetric matrices. Whereas the Golub-Kahn bi-diagonalization can be applied to any matrix.

Reference

  • NetLib Algorithm 6.27, netlib.org/utk/people/JackDongarra/etemplates/node198.html
  • Matrix Computations, Golub, p. 495
  • Demmel, J., Templates for Solution of Algebraic Eigenvalue Problem, p. 143
Warning
When the matrix A is very close to the identity matrix, the Golub-Kahn bi-diagonalization method can not find beta, as beta becomes zero. If A is not exactly identity, you may decrease the Tolerance to a very small number. However, if A is almost identity matrix, decreasing lanczos_tol will not help, and this function cannot be used.
See also
lanczos_tridiagonalizaton
Parameters
[in]AA linear operator that represents a matrix of size (n,n) and can perform matrix-vector operation with dot() method and transposed matrix-vector operation with transpose_dot() method. This matrix should be positive-definite.
[in]vStart vector for the Lanczos tri-diagonalization. Column vector of size n. It could be generated randomly. Often it is generated by the Rademacher distribution with entries +1 and -1.
[in]nSize of the square matrix A, which is also the size of the vector v.
[in]mLanczos degree, which is the number of Lanczos iterations.
[in]lanczos_tolThe tolerance of the residual error of the Lanczos iteration.
[in]orthogonalizeIndicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations.
  • If set to 0, no orthogonalization is performed.
  • If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization).
  • If set to a positive integer, say q less than lanczos_degree, the newly computed eigenvector is orthogonalized against the last q previous eigenvectors (partial reorthogonalization).
  • If set to an integer larger than lanczos_degree, it is cut to lanczos_degree, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[out]alphaThis is a 1D array of size m and alpha[:] constitute the diagonal elements of the bi-diagonal matrix B. This is the output and written in place.
[out]betaThis is a 1D array of size m, and the elements beta[:] constitute the sup-diagonals of the bi-diagonal matrix B. This array is the output and written in place.
Returns
Counter for the Lanczos iterations. Normally, the size of the output matrix should be (m,m), which is the Lanczos degree. However, if the algorithm terminates early, the size of alpha and beta, and hence the output tri-diagonal matrix, is smaller. This counter keeps track of the non-zero size of alpha and beta.

Definition at line 113 of file cu_golub_kahn_bidiagonalization.cu.

122 {
123  // Get cublas handle
124  cublasHandle_t cublas_handle = A->get_cublas_handle();
125 
126  // buffer_size is number of last orthogonal vectors to keep in buffers U, V
127  IndexType buffer_size;
128  if (orthogonalize == 0)
129  {
130  // At least two vectors must be stored in buffer for Lanczos recursion
131  buffer_size = 2;
132  }
133  else if ((orthogonalize < 0) ||
134  (orthogonalize > static_cast<FlagType>(m) - 1))
135  {
136  // Using full reorthogonalization, keep all of the m vectors in buffer
137  buffer_size = m;
138  }
139  else
140  {
141  // Orthogonalize with less than m vectors (0 < orthogonalize < m-1)
142  // plus one vector for the latest (the j-th) vector
143  buffer_size = orthogonalize + 1;
144  }
145 
146  // Allocate 2D array (as 1D array, and coalesced row-wise) to store
147  // the last buffer_size of orthogonalized vectors of length n. New vectors
148  // are stored by cycling through the buffer to replace with old ones.
149  DataType* device_U = CudaInterface<DataType>::alloc(n * buffer_size);
150  DataType* device_V = CudaInterface<DataType>::alloc(n * buffer_size);
151 
152  // Normalize vector v and copy to v_old
153  CudaInterface<DataType>::copy_to_device(v, n, &device_V[0]);
155  cublas_handle, &device_V[0], n);
156 
157  // Declare iterators
158  IndexType j;
159  IndexType lanczos_size = 0;
160  IndexType num_ortho;
161 
162  // Golub-Kahn iteration
163  for (j=0; j < m; ++j)
164  {
165  // Counter for the non-zero size of alpha and beta
166  ++lanczos_size;
167 
168  // u_new = A.dot(v_old)
169  A->dot(&device_V[(j % buffer_size)*n], &device_U[(j % buffer_size)*n]);
170 
171  // Performing: u_new[i] = u_new[i] - beta[j] * u_old[i]
172  if (j > 0)
173  {
175  cublas_handle,
176  &device_U[((j-1) % buffer_size)*n], n, beta[j-1],
177  &device_U[(j % buffer_size)*n]);
178  }
179 
180  // orthogonalize u_new against previous vectors
181  if (orthogonalize != 0)
182  {
183  // Find how many column vectors are filled so far in the buffer V
184  if (j < buffer_size)
185  {
186  num_ortho = j;
187  }
188  else
189  {
190  num_ortho = buffer_size - 1;
191  }
192 
193  // Gram-Schmidt process
194  if (j > 0)
195  {
197  cublas_handle, &device_U[0], n, buffer_size,
198  (j-1)%buffer_size, num_ortho,
199  &device_U[(j % buffer_size)*n]);
200  }
201  }
202 
203  // Normalize u_new and set its norm to alpha[j]
205  cublas_handle, &device_U[(j % buffer_size)*n], n);
206 
207  // Performing: v_new = A.T.dot(u_new) - alpha[j] * v_old
208  A->transpose_dot(&device_U[(j % buffer_size)*n],
209  &device_V[((j+1) % buffer_size)*n]);
210 
211  // Performing: v_new[i] = v_new[i] - alpha[j] * v_old[i]
213  cublas_handle, &device_V[(j % buffer_size)*n], n, alpha[j],
214  &device_V[((j+1) % buffer_size)*n]);
215 
216  // orthogonalize v_new against previous vectors
217  if (orthogonalize != 0)
218  {
220  cublas_handle, &device_V[0], n, buffer_size, j%buffer_size,
221  num_ortho, &device_V[((j+1) % buffer_size)*n]);
222  }
223 
224  // Update beta as the norm of v_new
226  cublas_handle, &device_V[((j+1) % buffer_size)*n], n);
227 
228  // Exit criterion when the vector r is zero. If each component of a
229  // zero vector has the tolerance epsilon, (which is called lanczos_tol
230  // here), the tolerance of norm of r is epsilon times sqrt of n.
231  if (beta[j] < lanczos_tol * sqrt(n))
232  {
233  break;
234  }
235  }
236 
237  // Free dynamic memory
240 
241  return lanczos_size;
242 }
static void del(void *device_array)
Deletes memory on gpu device if its pointer is not NULL, then sets the pointer to NULL.
static ArrayType * alloc(const LongIndexType array_size)
Allocates memory on gpu device. This function creates a pointer and returns it.
static void copy_to_device(const ArrayType *host_array, const LongIndexType array_size, ArrayType *device_array)
Copies memory on host to device memory.
virtual void transpose_dot(const DataType *vector, DataType *product)=0
virtual void dot(const DataType *vector, DataType *product)=0
cublasHandle_t get_cublas_handle() const
This function returns a reference to the cublasHandle_t object. The object will be created,...
static void gram_schmidt_process(cublasHandle_t cublas_handle, const DataType *V, const LongIndexType vector_size, const IndexType num_vectors, const IndexType last_vector, const FlagType num_ortho, DataType *r)
Modified Gram-Schmidt orthogonalization process to orthogonalize the vector v against a subset of the...
static void subtract_scaled_vector(cublasHandle_t cublas_handle, const DataType *input_vector, const LongIndexType vector_size, const DataType scale, DataType *output_vector)
Subtracts the scaled input vector from the output vector.
static DataType normalize_vector_in_place(cublasHandle_t cublas_handle, DataType *vector, const LongIndexType vector_size)
Normalizes a vector based on Euclidean 2-norm. The result is written in-place.
int FlagType
Definition: types.h:68
int IndexType
Definition: types.h:65

References CudaInterface< ArrayType >::alloc(), CudaInterface< ArrayType >::copy_to_device(), CudaInterface< ArrayType >::del(), cLinearOperator< DataType >::dot(), cuLinearOperator< DataType >::get_cublas_handle(), cuOrthogonalization< DataType >::gram_schmidt_process(), cuVectorOperations< DataType >::normalize_vector_in_place(), cuVectorOperations< DataType >::subtract_scaled_vector(), and cLinearOperator< DataType >::transpose_dot().

Referenced by cuTraceEstimator< DataType >::_cu_stochastic_lanczos_quadrature().

Here is the call graph for this function:
Here is the caller graph for this function: