imate
C++/CUDA Reference
Loading...
Searching...
No Matches
cu_golub_kahn_bidiagonalization.cu File Reference

Go to the source code of this file.

Functions

template<typename DataType >
IndexType cu_golub_kahn_bidiagonalization (cuLinearOperator< DataType > *A, const DataType *v, const LongIndexType n, const IndexType m, const DataType lanczos_tol, const FlagType orthogonalize, DataType *alpha, DataType *beta)
 Bi-diagonalizes the positive-definite matrix A using Golub-Kahn-Lanczos method.
 
template IndexType cu_golub_kahn_bidiagonalization< float > (cuLinearOperator< float > *A, const float *v, const LongIndexType n, const IndexType m, const float lanczos_tol, const FlagType orthogonalize, float *alpha, float *beta)
 
template IndexType cu_golub_kahn_bidiagonalization< double > (cuLinearOperator< double > *A, const double *v, const LongIndexType n, const IndexType m, const double lanczos_tol, const FlagType orthogonalize, double *alpha, double *beta)
 

Function Documentation

◆ cu_golub_kahn_bidiagonalization()

template<typename DataType >
IndexType cu_golub_kahn_bidiagonalization ( cuLinearOperator< DataType > *  A,
const DataType *  v,
const LongIndexType  n,
const IndexType  m,
const DataType  lanczos_tol,
const FlagType  orthogonalize,
DataType *  alpha,
DataType *  beta 
)

Bi-diagonalizes the positive-definite matrix A using Golub-Kahn-Lanczos method.

This method bi-diagonalizes matrix A to B using the start vector w. m is the Lanczos degree, which will be the size of square matrix B.

The output of this function are alpha (of length m) and beta (of length m+1) which are diagonal (alpha[:]) and off-diagonal (beta[1:]) elements of the bi-diagonal (m,m) symmetric and positive-definite matrix B.

Lanczos tridiagonalization vs Golub-Kahn Bidiagonalization

  • The Lanczos tri-diagonalization is twice faster (in runtime), as it has only one matrix-vector multiplication. Whereas the Golub-Kahn bi-diagonalization has two matrix-vector multiplications.
  • The Lanczos tri-diagonalization can only be applied to symmetric matrices. Whereas the Golub-Kahn bi-diagonalization can be applied to any matrix.

Reference

  • NetLib Algorithm 6.27, netlib.org/utk/people/JackDongarra/etemplates/node198.html
  • Matrix Computations, Golub, p. 495
  • Demmel, J., Templates for Solution of Algebraic Eigenvalue Problem, p. 143
Warning
When the matrix A is very close to the identity matrix, the Golub-Kahn bi-diagonalization method can not find beta, as beta becomes zero. If A is not exactly identity, you may decrease the Tolerance to a very small number. However, if A is almost identity matrix, decreasing lanczos_tol will not help, and this function cannot be used.
See also
lanczos_tridiagonalizaton
Parameters
[in]AA linear operator that represents a matrix of size (n,n) and can perform matrix-vector operation with dot() method and transposed matrix-vector operation with transpose_dot() method. This matrix should be positive-definite.
[in]vStart vector for the Lanczos tri-diagonalization. Column vector of size n. It could be generated randomly. Often it is generated by the Rademacher distribution with entries +1 and -1.
[in]nSize of the square matrix A, which is also the size of the vector v.
[in]mLanczos degree, which is the number of Lanczos iterations.
[in]lanczos_tolThe tolerance of the residual error of the Lanczos iteration.
[in]orthogonalizeIndicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations.
  • If set to 0, no orthogonalization is performed.
  • If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization).
  • If set to a positive integer, say q less than lanczos_degree, the newly computed eigenvector is orthogonalized against the last q previous eigenvectors (partial reorthogonalization).
  • If set to an integer larger than lanczos_degree, it is cut to lanczos_degree, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[out]alphaThis is a 1D array of size m and alpha[:] constitute the diagonal elements of the bi-diagonal matrix B. This is the output and written in place.
[out]betaThis is a 1D array of size m, and the elements beta[:] constitute the sup-diagonals of the bi-diagonal matrix B. This array is the output and written in place.
Returns
Counter for the Lanczos iterations. Normally, the size of the output matrix should be (m,m), which is the Lanczos degree. However, if the algorithm terminates early, the size of alpha and beta, and hence the output tri-diagonal matrix, is smaller. This counter keeps track of the non-zero size of alpha and beta.

Definition at line 133 of file cu_golub_kahn_bidiagonalization.cu.

142{
143 // Get cublas handle
144 cublasHandle_t cublas_handle = A->get_cublas_handle();
145
146 // buffer_size is number of last orthogonal vectors to keep in buffers U, V
147 IndexType buffer_size;
148 if (orthogonalize == 0)
149 {
150 // At least two vectors must be stored in buffer for Lanczos recursion
151 buffer_size = 2;
152 }
153 else if ((orthogonalize < 0) ||
154 (orthogonalize > static_cast<FlagType>(m) - 1))
155 {
156 // Using full re-orthogonalization, keep all of the m vectors in buffer
157 buffer_size = m;
158 }
159 else
160 {
161 // Orthogonalize with less than m vectors (0 < orthogonalize < m-1)
162 // plus one vector for the latest (the j-th) vector
163 buffer_size = orthogonalize + 1;
164 }
165
166 // Allocate 2D array (as 1D array, and coalesced row-wise) to store
167 // the last buffer_size of orthogonalized vectors of length n. New vectors
168 // are stored by cycling through the buffer to replace with old ones.
169 DataType* device_U = CudaAPI<DataType>::alloc(n * buffer_size);
170 DataType* device_V = CudaAPI<DataType>::alloc(n * buffer_size);
171
172 // Normalize vector v and copy to v_old
173 CudaAPI<DataType>::copy_to_device(v, n, &device_V[0]);
175 cublas_handle, &device_V[0], n);
176
177 // Declare iterators
178 IndexType j;
179 IndexType lanczos_size = 0;
180 IndexType num_ortho;
181
182 // Golub-Kahn iteration
183 for (j=0; j < m; ++j)
184 {
185 // Counter for the non-zero size of alpha and beta
186 ++lanczos_size;
187
188 // u_new = A.dot(v_old)
189 A->dot(&device_V[(j % buffer_size)*n], &device_U[(j % buffer_size)*n]);
190
191 // Performing: u_new[i] = u_new[i] - beta[j] * u_old[i]
192 if (j > 0)
193 {
195 cublas_handle,
196 &device_U[((j-1) % buffer_size)*n], n, beta[j-1],
197 &device_U[(j % buffer_size)*n]);
198 }
199
200 // orthogonalize u_new against previous vectors
201 if (orthogonalize != 0)
202 {
203 // Find how many column vectors are filled so far in the buffer V
204 if (j < buffer_size)
205 {
206 num_ortho = j;
207 }
208 else
209 {
210 num_ortho = buffer_size - 1;
211 }
212
213 // Gram-Schmidt process
214 if (j > 0)
215 {
217 cublas_handle, &device_U[0], n, buffer_size,
218 (j-1)%buffer_size, num_ortho,
219 &device_U[(j % buffer_size)*n]);
220 }
221 }
222
223 // Normalize u_new and set its norm to alpha[j]
225 cublas_handle, &device_U[(j % buffer_size)*n], n);
226
227 // Performing: v_new = A.T.dot(u_new) - alpha[j] * v_old
228 A->transpose_dot(&device_U[(j % buffer_size)*n],
229 &device_V[((j+1) % buffer_size)*n]);
230
231 // Performing: v_new[i] = v_new[i] - alpha[j] * v_old[i]
233 cublas_handle, &device_V[(j % buffer_size)*n], n, alpha[j],
234 &device_V[((j+1) % buffer_size)*n]);
235
236 // orthogonalize v_new against previous vectors
237 if (orthogonalize != 0)
238 {
240 cublas_handle, &device_V[0], n, buffer_size, j%buffer_size,
241 num_ortho, &device_V[((j+1) % buffer_size)*n]);
242 }
243
244 // Update beta as the norm of v_new
246 cublas_handle, &device_V[((j+1) % buffer_size)*n], n);
247
248 // Exit criterion when the vector r is zero. If each component of a
249 // zero vector has the tolerance epsilon, (which is called lanczos_tol
250 // here), the tolerance of norm of r is epsilon times sqrt of n.
251 if (beta[j] < cu_arithmetics::mul(
252 lanczos_tol,
254 static_cast<double>(std::sqrt(n)))
255 )
256 )
257 {
258 break;
259 }
260 }
261
262 // Free dynamic memory
263 CudaAPI<DataType>::del(device_U);
264 CudaAPI<DataType>::del(device_V);
265
266 return lanczos_size;
267}
static ArrayType * alloc(const size_t array_size)
Allocates memory on gpu device. This function creates a pointer and returns it.
Definition cuda_api.cu:39
static void del(void *device_array)
Deletes memory on gpu device if its pointer is not NULL, then sets the pointer to NULL.
Definition cuda_api.cu:169
static void copy_to_device(const ArrayType *host_array, const size_t array_size, ArrayType *device_array)
Copies memory on host to device memory.
Definition cuda_api.cu:145
virtual void dot(const DataType *vector, DataType *product)=0
cublasHandle_t get_cublas_handle() const
This function returns a reference to the cublasHandle_t object. The object will be created,...
virtual void transpose_dot(const DataType *vector, DataType *product)=0
static void gram_schmidt_process(cublasHandle_t cublas_handle, const DataType *V, const LongIndexType vector_size, const IndexType num_vectors, const IndexType last_vector, const FlagType num_ortho, DataType *r)
Modified Gram-Schmidt orthogonalization process to orthogonalize the vector v against a subset of the...
static DataType normalize_vector_in_place(cublasHandle_t cublas_handle, DataType *RESTRICT vector, const LongIndexType vector_size)
Normalizes a vector based on Euclidean 2-norm. The result is written in-place.
static void subtract_scaled_vector(cublasHandle_t cublas_handle, const DataType *RESTRICT input_vector, const LongIndexType vector_size, const DataType scale, DataType *RESTRICT output_vector)
Subtracts the scaled input vector from the output vector.
__host__ __device__ DataType mul(const DataType x, const DataType y)
Multiply two floating point numbers in round-to-nearest-even mode.
__host__ __device__ DataType abs(const DataType x)
Absolute value of a floating point number.
int FlagType
Definition types.h:68
int IndexType
Definition types.h:65

References cu_arithmetics::abs(), CudaAPI< ArrayType >::alloc(), CudaAPI< ArrayType >::copy_to_device(), CudaAPI< ArrayType >::del(), cuLinearOperator< DataType >::dot(), cuLinearOperator< DataType >::get_cublas_handle(), cuOrthogonalization< DataType >::gram_schmidt_process(), cu_arithmetics::mul(), cuVectorOperations< DataType >::normalize_vector_in_place(), cuVectorOperations< DataType >::subtract_scaled_vector(), and cuLinearOperator< DataType >::transpose_dot().

Referenced by cuTraceEstimator< DataType >::_cu_stochastic_lanczos_quadrature().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ cu_golub_kahn_bidiagonalization< double >()

template IndexType cu_golub_kahn_bidiagonalization< double > ( cuLinearOperator< double > *  A,
const double *  v,
const LongIndexType  n,
const IndexType  m,
const double  lanczos_tol,
const FlagType  orthogonalize,
double *  alpha,
double *  beta 
)

◆ cu_golub_kahn_bidiagonalization< float >()

template IndexType cu_golub_kahn_bidiagonalization< float > ( cuLinearOperator< float > *  A,
const float *  v,
const LongIndexType  n,
const IndexType  m,
const float  lanczos_tol,
const FlagType  orthogonalize,
float *  alpha,
float *  beta 
)