imate
C++/CUDA Reference
cu_lanczos_tridiagonalization.cu File Reference
Include dependency graph for cu_lanczos_tridiagonalization.cu:

Go to the source code of this file.

Functions

template<typename DataType >
IndexType cu_lanczos_tridiagonalization (cuLinearOperator< DataType > *A, const DataType *v, const LongIndexType n, const IndexType m, const DataType lanczos_tol, const FlagType orthogonalize, DataType *alpha, DataType *beta)
 Tri-diagonalizes matrix A to T using the start vector v. is the Lanczos degree, which will be the size of square matrix T. More...
 
template IndexType cu_lanczos_tridiagonalization< float > (cuLinearOperator< float > *A, const float *v, const LongIndexType n, const IndexType m, const float lanczos_tol, const FlagType orthogonalize, float *alpha, float *beta)
 
template IndexType cu_lanczos_tridiagonalization< double > (cuLinearOperator< double > *A, const double *v, const LongIndexType n, const IndexType m, const double lanczos_tol, const FlagType orthogonalize, double *alpha, double *beta)
 

Function Documentation

◆ cu_lanczos_tridiagonalization()

template<typename DataType >
IndexType cu_lanczos_tridiagonalization ( cuLinearOperator< DataType > *  A,
const DataType *  v,
const LongIndexType  n,
const IndexType  m,
const DataType  lanczos_tol,
const FlagType  orthogonalize,
DataType *  alpha,
DataType *  beta 
)

Tri-diagonalizes matrix A to T using the start vector v. is the Lanczos degree, which will be the size of square matrix T.

The output of this function is not an explicit matrix T, rather are the two arrays alpha of length m and beta of length m+1. The array alpha[:] represents the diagonal elements and beta[1:] represents the off-diagonal elements of the tri-diagonal (m,m) symmetric and positive-definite matrix T.

Lanczos tridiagonalization vs Golub-Kahn bidiagonalization

  • The Lanczos tri-diagonalization is twice faster (in runtime), as it has only one matrix-vector multiplication. Whereas the Golub-Kahn bi-diagonalization has two matrix-vector multiplications.
  • The Lanczos tri-diagonalization can only be applied to symmetric matrices. Whereas the Golub-Kahn bi-diagonalization can be applied to any matrix.

Algorithm

The algorithm and notations are obtained from [DEMMEL], p. 57, Algorithm 4.6 (see also [SAAD] p. 137, Algorithm 6.5). However there are four ways to implement the iteration. [PAIGE]_ has shown that the iteration that is implemented below is the most stable against loosing orthogonality of the eigenvectors. For details, see [CULLUM]_ p. 46, and p.48, particularly the algorithm denoted by A(2,7). The differences of these implementations are the order in which \( \alpha_j \) and \( \beta_j \) are defined and the order in which vectors are subtracted from r .

References

  • [DEMMEL] Demmel, J., Templates for solution of Algebraic Eigenvalue Problems, p. 57.
  • [SAAD] Saad, Numerical Methods for Large Eigenvalue Problems, p. 137.
  • [PAIGE] Paige (1980) Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem.
  • [CULLUM] Cullum; Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computations. 1. pp.46-48.
Parameters
[in]AA linear operator that represents a matrix of size (n,n) and can perform matrix-vector operation with dot() method. This matrix should be positive-definite.
[in]vStart vector for the Lanczos tri-diagonalization. Column vector of size c n. It could be generated randomly. Often it is generated by the Rademacher distribution with entries c +1 and -1.
[in]nSize of the square matrix A, which is also the size of the vector v.
[in]mLanczos degree, which is the number of Lanczos iterations.
[in]lanczos_tolThe tolerance of the residual error of the Lanczos iteration.
[in]orthogonalizeIndicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations.
  • If set to 0, no orthogonalization is performed.
  • If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization).
  • If set to a positive integer, say q less than lanczos_degree, the newly computed eigenvector is orthogonalized against the last q previous eigenvectors (partial reorthogonalization).
  • If set to an integer larger than lanczos_degree, it is cut to lanczos_degree, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[out]alphaThis is a 1D array of size m. The array alpha[:] constitute the diagonal elements of the tri-diagonal matrix T. This is the output and written in place.
[out]betaThis is a 1D array of size m. The array beta[:] constitute the off-diagonals of the tri-diagonal matrix T. This array is the output and written in place.
Returns
Counter for the Lanczos iterations. Normally, the size of the output matrix should be (m,m), which is the Lanczos degree. However, if the algorithm terminates early, the size of alpha and beta, and hence the output tri-diagonal matrix, is smaller. This counter keeps track of the non-zero size of alpha and beta.

Definition at line 119 of file cu_lanczos_tridiagonalization.cu.

128 {
129  // Get cublas handle
130  cublasHandle_t cublas_handle = A->get_cublas_handle();
131 
132  // buffer_size is number of last orthogonal vectors to keep in the buffer V
133  IndexType buffer_size;
134  if (orthogonalize == 0 || orthogonalize == 1)
135  {
136  // At least two vectors must be stored in buffer for Lanczos recursion
137  buffer_size = 2;
138  }
139  else if ((orthogonalize < 0) ||
140  (orthogonalize > static_cast<FlagType>(m)))
141  {
142  // Using full reorthogonalization, keep all of the m vectors in buffer
143  buffer_size = m;
144  }
145  else
146  {
147  // Orthogonalize with less than m vectors (0 < orthogonalize < m)
148  buffer_size = orthogonalize;
149  }
150 
151  // Allocate 2D array (as 1D array, and coalesced row-wise) to store
152  // the last buffer_size of orthogonalized vectors of length n. New vectors
153  // are stored by cycling through the buffer to replace with old ones.
154  DataType* device_V = CudaInterface<DataType>::alloc(n * buffer_size);
155 
156  // Allocate vector r
157  DataType* device_r = CudaInterface<DataType>::alloc(n);
158 
159  // Copy v into r
161 
162  // Initial beta
163  DataType initial_beta = cuVectorOperations<DataType>::euclidean_norm(
164  cublas_handle, device_r, n);
165 
166  // Declare iterators
167  IndexType j;
168  IndexType lanczos_size = 0;
169  IndexType num_ortho;
170 
171  // In the following, beta[j] means beta[j-1] in the Demmel text
172  for (j=0; j < m; ++j)
173  {
174  // Update the size of Lanczos tridiagonal matrix
175  ++lanczos_size;
176 
177  // Normalize r and copy to the j-th column of V
178  if (j == 0)
179  {
181  cublas_handle, device_r, n, 1.0/initial_beta,
182  &device_V[(j % buffer_size)*n]);
183  }
184  else
185  {
187  cublas_handle, device_r, n, 1.0/beta[j-1],
188  &device_V[(j % buffer_size)*n]);
189  }
190 
191  // Multiply A to the j-th column of V, write into r
192  A->dot(&device_V[(j % buffer_size)*n], device_r);
193 
194  // alpha[j] is V[:, j] dot r
196  cublas_handle, &device_V[(j % buffer_size)*n], device_r, n);
197 
198  // Subtract V[:,j] * alpha[j] from r
200  cublas_handle, &device_V[(j % buffer_size)*n], n, alpha[j],
201  device_r);
202 
203  // Subtract V[:,j-1] * beta[j] from r
204  if (j > 0)
205  {
207  cublas_handle, &device_V[((j-1) % buffer_size)*n], n,
208  beta[j-1], device_r);
209  }
210 
211  // Gram-Schmidt process (full re-orthogonalization)
212  if (orthogonalize != 0)
213  {
214  // Find how many column vectors are filled so far in the buffer V
215  if (j < buffer_size)
216  {
217  num_ortho = j+1;
218  }
219  else
220  {
221  num_ortho = buffer_size;
222  }
223 
224  // Gram-Schmidt process
226  cublas_handle, &device_V[0], n, buffer_size, j%buffer_size,
227  num_ortho, device_r);
228  }
229 
230  // beta is norm of r
232  cublas_handle, device_r, n);
233 
234  // Exit criterion when the vector r is zero. If each component of a
235  // zero vector has the tolerance epsilon, (which is called lanczos_tol
236  // here), the tolerance of norm of r is epsilon times sqrt of n.
237  if (beta[j] < lanczos_tol * sqrt(n))
238  {
239  break;
240  }
241  }
242 
243  // Free dynamic memory
246 
247  return lanczos_size;
248 }
static void del(void *device_array)
Deletes memory on gpu device if its pointer is not NULL, then sets the pointer to NULL.
static ArrayType * alloc(const LongIndexType array_size)
Allocates memory on gpu device. This function creates a pointer and returns it.
static void copy_to_device(const ArrayType *host_array, const LongIndexType array_size, ArrayType *device_array)
Copies memory on host to device memory.
virtual void dot(const DataType *vector, DataType *product)=0
cublasHandle_t get_cublas_handle() const
This function returns a reference to the cublasHandle_t object. The object will be created,...
static void gram_schmidt_process(cublasHandle_t cublas_handle, const DataType *V, const LongIndexType vector_size, const IndexType num_vectors, const IndexType last_vector, const FlagType num_ortho, DataType *r)
Modified Gram-Schmidt orthogonalization process to orthogonalize the vector v against a subset of the...
static void copy_scaled_vector(cublasHandle_t cublas_handle, const DataType *input_vector, const LongIndexType vector_size, const DataType scale, DataType *output_vector)
Scales a vector and stores to a new vector.
static void subtract_scaled_vector(cublasHandle_t cublas_handle, const DataType *input_vector, const LongIndexType vector_size, const DataType scale, DataType *output_vector)
Subtracts the scaled input vector from the output vector.
static DataType inner_product(cublasHandle_t cublas_handle, const DataType *vector1, const DataType *vector2, const LongIndexType vector_size)
Computes Euclidean inner product of two vectors.
static DataType euclidean_norm(cublasHandle_t cublas_handle, const DataType *vector, const LongIndexType vector_size)
Computes the Euclidean 2-norm of a 1D array.
int FlagType
Definition: types.h:68
int IndexType
Definition: types.h:65

References CudaInterface< ArrayType >::alloc(), cuVectorOperations< DataType >::copy_scaled_vector(), CudaInterface< ArrayType >::copy_to_device(), CudaInterface< ArrayType >::del(), cLinearOperator< DataType >::dot(), cuVectorOperations< DataType >::euclidean_norm(), cuLinearOperator< DataType >::get_cublas_handle(), cuOrthogonalization< DataType >::gram_schmidt_process(), cuVectorOperations< DataType >::inner_product(), and cuVectorOperations< DataType >::subtract_scaled_vector().

Referenced by cuTraceEstimator< DataType >::_cu_stochastic_lanczos_quadrature().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ cu_lanczos_tridiagonalization< double >()

template IndexType cu_lanczos_tridiagonalization< double > ( cuLinearOperator< double > *  A,
const double *  v,
const LongIndexType  n,
const IndexType  m,
const double  lanczos_tol,
const FlagType  orthogonalize,
double *  alpha,
double *  beta 
)

◆ cu_lanczos_tridiagonalization< float >()

template IndexType cu_lanczos_tridiagonalization< float > ( cuLinearOperator< float > *  A,
const float *  v,
const LongIndexType  n,
const IndexType  m,
const float  lanczos_tol,
const FlagType  orthogonalize,
float *  alpha,
float *  beta 
)