imate
C++/CUDA Reference
cTraceEstimator< DataType > Class Template Reference

A static class to compute the trace of implicit matrix functions using stochastic Lanczos quadrature method. This class acts as a templated namespace, where the member methods is public and static. The internal private member functions are also static. More...

#include <c_trace_estimator.h>

Static Public Member Functions

static FlagType c_trace_estimator (cLinearOperator< DataType > *A, DataType *parameters, const IndexType num_inquiries, const Function *matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const int64_t seed, const IndexType lanczos_degree, const DataType lanczos_tol, const IndexType min_num_samples, const IndexType max_num_samples, const DataType error_atol, const DataType error_rtol, const DataType confidence_level, const DataType outlier_significance_level, const IndexType num_threads, DataType *trace, DataType *error, DataType **samples, IndexType *processed_samples_indices, IndexType *num_samples_used, IndexType *num_outliers, FlagType *converged, float &alg_wall_time)
 Stochastic Lanczos quadrature method to estimate trace of a function of a linear operator. Both function and the linear operator can be defined with parameters. More...
 

Static Private Member Functions

static void _c_stochastic_lanczos_quadrature (cLinearOperator< DataType > *A, DataType *parameters, const IndexType num_inquiries, const Function *matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const IndexType lanczos_degree, const DataType lanczos_tol, RandomNumberGenerator &random_number_generator, DataType *random_vector, FlagType *converged, DataType *trace_estimate)
 For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quadrature method. More...
 

Detailed Description

template<typename DataType>
class cTraceEstimator< DataType >

A static class to compute the trace of implicit matrix functions using stochastic Lanczos quadrature method. This class acts as a templated namespace, where the member methods is public and static. The internal private member functions are also static.

See also
Diagonalization

Definition at line 39 of file c_trace_estimator.h.

Member Function Documentation

◆ _c_stochastic_lanczos_quadrature()

template<typename DataType >
void cTraceEstimator< DataType >::_c_stochastic_lanczos_quadrature ( cLinearOperator< DataType > *  A,
DataType *  parameters,
const IndexType  num_inquiries,
const Function matrix_function,
const FlagType  gram,
const DataType  exponent,
const FlagType  orthogonalize,
const IndexType  lanczos_degree,
const DataType  lanczos_tol,
RandomNumberGenerator random_number_generator,
DataType *  random_vector,
FlagType converged,
DataType *  trace_estimate 
)
staticprivate

For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quadrature method.

Note
In special case when an eigenvalue relation is known, this function sets the converged inquiries to "not" converged in order to continue updating those inquiries. This is because in this special case, computing for other inquiries is free.
Parameters
[in]AAn instance of a class derived from LinearOperator class. This object will perform the matrix-vector operation and/or transposed matrix-vector operation for a linear operator. The linear operator can represent a fixed matrix, or a combination of matrices together with some given parameters.
[in]parametersThe parameters of the linear operator A. The size of this array is num_parameters*num_inquiries where num_parameters is the number of parameters that define the linear operator A, and num_inquiries is the number of different set of parameters to compute trace on different parametrized operators. The j-th set of parameters are stored in parameters[j*num_parameters:(j+1)*num_parameters]. That is, this array is contiguous over each batch of parameters.
[in]num_inquiriesThe number of batches of parameters. This function computes num_inquiries values of trace corresponding to different batch of parameters of the linear operator A. Hence, the number of output trace is num_inquiries. Hence, it is the number of columns of the output array samples.
[in]matrix_functionAn instance of Function class which has the function function. This function defines the matrix function, and operates on scalar eigenvalues of the matrix.
[in]gramFlag indicating whether the linear operator A is Gramian. If the linear operator is:
  • Gramian, then, Lanczos tridiagonalization method is employed. This method requires only matrix-vector multiplication.
  • not Gramian, then, Golub-Kahn bidiagonalization method is employed. This method requires both matrix and transposed-matrix vector multiplications.
[in]exponentThe exponent parameter p in the trace of the expression \( f((\mathbf{A} + t \mathbf{B})^p) \). The exponent is a real number and by default it is set to 1.0.
[in]orthogonalizeIndicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations.
  • If set to 0, no orthogonalization is performed.
  • If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization).
  • If set to a positive integer, say q less than lanczos_degree, the newly computed eigenvector is orthogonalized against the last q previous eigenvectors (partial reorthogonalization).
  • If set to an integer larger than lanczos_degree, it is cut to lanczos_degree, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[in]lanczos_degreeThe number of Lanczos recursive iterations. The operator A is reduced to a square tridiagonal (or bidiagonal) matrix of the size lanczos_degree. The eigenvalues (or singular values) of this reduced matrix is computed and used in the stochastic Lanczos quadrature method. The larger Lanczos degre leads to a better estimation. The computational cost is quadratically increases with the lanczos degree.
[in]lanczos_tolThe tolerance to stop the Lanczos recursive iterations before the end of iterations reached. If the tolerance is not met, the iterations (total of lanczos_degree iterations) continue till end.
[in]random_number_generatorGenerates random numbers that fills random_vector. In each parallel thread, an independent sequence of random numbers are generated. This object should be initialized by num_threads.
[in]random_vectorA 1D vector of the size of matrix A. The Lanczos iterations start off from this random vector. Each given random vector is used per a Monte-Carlo computation of the SLQ method. In the Lanczos iterations, other vectors are generated orthogonal to this initial random vector. This array is filled inside this function.
[out]converged1D array of the size of the number of columns of samples. Each element indicates which column of samples has converged to the tolerance criteria. Normally, if the num_samples used is less than max_num_samples, it indicates that the convergence has reached.
[out]trace_estimate1D array of the size of the number of columns of samples array. This array constitures each row of samples array. Each element of trace_estimates is the estimated trace for each parameter inquiry.

Definition at line 411 of file c_trace_estimator.cpp.

425 {
426  // Matrix size
427  IndexType matrix_size = A->get_num_rows();
428 
429  // Fill random vectors with Rademacher distribution (+1, -1), normalized
430  // but not orthogonalized. Setting num_threads to zero indicates to not
431  // create any new threads in RandomNumbrGenerator since the current
432  // function is inside a parallel thread.
433  IndexType num_threads = 0;
435  random_number_generator, random_vector, matrix_size, num_threads);
436 
437  // Allocate diagonals (alpha) and supdiagonals (beta) of Lanczos matrix
438  DataType* alpha = new DataType[lanczos_degree];
439  DataType* beta = new DataType[lanczos_degree];
440 
441  // Define 2D arrays needed to decomposition. All these arrays are
442  // defined as 1D array with Fortran ordering
443  DataType* eigenvectors = NULL;
444  DataType* left_singularvectors = NULL;
445  DataType* right_singularvectors_transposed = NULL;
446 
447  // Actual number of inquiries
448  IndexType required_num_inquiries = num_inquiries;
450  {
451  // When a relation between eigenvalues and the parameters of the linear
452  // operator is known, to compute eigenvalues of for each inquiry, only
453  // computing one inquiry is enough. This is because an eigenvalue for
454  // one parameter setting is enough to compute eigenvalue of another set
455  // of parameters.
456  required_num_inquiries = 1;
457  }
458 
459  // Allocate and initialize theta
460  IndexType i;
461  IndexType j;
462  DataType** theta = new DataType*[num_inquiries];
463  for (j=0; j < num_inquiries; ++j)
464  {
465  theta[j] = new DataType[lanczos_degree];
466 
467  // Initialize components to zero
468  for (i=0; i < lanczos_degree; ++i)
469  {
470  theta[j][i] = 0.0;
471  }
472  }
473 
474  // Allocate and initialize tau
475  DataType** tau = new DataType*[num_inquiries];
476  for (j=0; j < num_inquiries; ++j)
477  {
478  tau[j] = new DataType[lanczos_degree];
479 
480  // Initialize components to zero
481  for (i=0; i < lanczos_degree; ++i)
482  {
483  tau[j][i] = 0.0;
484  }
485  }
486 
487  // Allocate lanczos size for each inquiry. This variable keeps the non-zero
488  // size of the tri-diagonal (or bi-diagonal) matrix. Ideally, this matrix
489  // is of the size lanczos_degree. But, due to the early termination, this
490  // size might be smaller.
491  IndexType* lanczos_size = new IndexType[num_inquiries];
492 
493  // Number of parameters of linear operator A
494  IndexType num_parameters = A->get_num_parameters();
495 
496  // Lanczos iterations, computes theta and tau for each inquiry parameter
497  for (j=0; j < required_num_inquiries; ++j)
498  {
499  // If trace is already converged, do not compute on the new sample.
500  // However, exclude the case where required_num_inquiries is not the
501  // same as num_inquiries, since in this case, we compute one inquiry
502  // for multiple parameters.
503  if ((converged[j] == 1) && (required_num_inquiries == num_inquiries))
504  {
505  continue;
506  }
507 
508  // Set parameter of linear operator A
509  A->set_parameters(&parameters[j*num_parameters]);
510 
511  if (gram)
512  {
513  // Use Golub-Kahn-Lanczos Bi-diagonalization
514  lanczos_size[j] = c_golub_kahn_bidiagonalization(
515  A, random_vector, matrix_size, lanczos_degree, lanczos_tol,
516  orthogonalize, alpha, beta);
517 
518  // Allocate matrix of singular vectors (1D array, Fortran ordering)
519  left_singularvectors = \
520  new DataType[lanczos_size[j] * lanczos_size[j]];
521  right_singularvectors_transposed = \
522  new DataType[lanczos_size[j] * lanczos_size[j]];
523 
524  // Note: alpha is written in-place with singular values
526  alpha, beta, left_singularvectors,
527  right_singularvectors_transposed, lanczos_size[j]);
528 
529  // theta and tau from singular values and vectors
530  for (i=0; i < lanczos_size[j]; ++i)
531  {
532  theta[j][i] = alpha[i] * alpha[i];
533  tau[j][i] = right_singularvectors_transposed[i];
534  }
535  }
536  else
537  {
538  // Use Lanczos Tri-diagonalization
539  lanczos_size[j] = c_lanczos_tridiagonalization(
540  A, random_vector, matrix_size, lanczos_degree, lanczos_tol,
541  orthogonalize, alpha, beta);
542 
543  // Allocate eigenvectors matrix (1D array with Fortran ordering)
544  eigenvectors = new DataType[lanczos_size[j] * lanczos_size[j]];
545 
546  // Note: alpha is written in-place with eigenvalues
548  alpha, beta, eigenvectors, lanczos_size[j]);
549 
550  // theta and tau from singular values and vectors
551  for (i=0; i < lanczos_size[j]; ++i)
552  {
553  theta[j][i] = alpha[i];
554  tau[j][i] = eigenvectors[i * lanczos_size[j]];
555  }
556  }
557  }
558 
559  // If an eigenvalue relation is known, compute the rest of eigenvalues
560  // using the eigenvalue relation given in the operator A for its
561  // eigenvalues. If no eigenvalue relation is not known, the rest of
562  // eigenvalues were already computed in the above loop and no other
563  // computation is needed.
564  if (A->is_eigenvalue_relation_known() && num_inquiries > 1)
565  {
566  // When the code execution reaches this function, at least one of the
567  // inquiries is not converged, but some others might have been
568  // converged already. Here, we force-update those that are even
569  // converged already by setting converged to false. The extra update is
570  // free of charge when a relation for the eigenvalues are known.
571  for (j=0; j < num_inquiries; ++j)
572  {
573  converged[j] = 0;
574  }
575 
576  // Compute theta and tau for the rest of inquiry parameters
577  for (j=1; j < num_inquiries; ++j)
578  {
579  // Only j=0 was iterated before. Set the same size for other j-s
580  lanczos_size[j] = lanczos_size[0];
581 
582  for (i=0; i < lanczos_size[j]; ++i)
583  {
584  // Shift eigenvalues by the old and new parameters
585  theta[j][i] = A->get_eigenvalue(
586  &parameters[0],
587  theta[0][i],
588  &parameters[j*num_parameters]);
589 
590  // tau is the same (at least for the affine operator)
591  tau[j][i] = tau[0][i];
592  }
593  }
594  }
595 
596  // Estimate trace using quadrature method
597  DataType quadrature_sum;
598  for (j=0; j < num_inquiries; ++j)
599  {
600  // If the j-th inquiry is already converged, skip.
601  if (converged[j] == 1)
602  {
603  continue;
604  }
605 
606  // Initialize sum for the integral of quadrature
607  quadrature_sum = 0.0;
608 
609  // Important: This loop should iterate till lanczos_size[j], but not
610  // lanczos_degree. Otherwise the computation is wrong for certain
611  // matrices, such as if the input matrix is identity, or rank
612  // deficient. By using lanczos_size[j] instead of lanczos_degree, all
613  // issues with special matrices will resolve.
614  for (i=0; i < lanczos_size[j]; ++i)
615  {
616  quadrature_sum += tau[j][i] * tau[j][i] * \
617  matrix_function->function(pow(theta[j][i], exponent));
618  }
619 
620  trace_estimate[j] = matrix_size * quadrature_sum;
621  }
622 
623  // Release dynamic memory
624  delete[] alpha;
625  delete[] beta;
626  delete[] lanczos_size;
627 
628  for (j=0; j < required_num_inquiries; ++j)
629  {
630  delete[] theta[j];
631  }
632  delete[] theta;
633 
634  for (j=0; j < required_num_inquiries; ++j)
635  {
636  delete[] tau[j];
637  }
638  delete[] tau;
639 
640  if (eigenvectors != NULL)
641  {
642  delete[] eigenvectors;
643  }
644 
645  if (left_singularvectors != NULL)
646  {
647  delete[] left_singularvectors;
648  }
649 
650  if (right_singularvectors_transposed != NULL)
651  {
652  delete[] right_singularvectors_transposed;
653  }
654 }
IndexType c_golub_kahn_bidiagonalization(cLinearOperator< DataType > *A, const DataType *v, const LongIndexType n, const IndexType m, const DataType lanczos_tol, const FlagType orthogonalize, DataType *alpha, DataType *beta)
Bi-diagonalizes the positive-definite matrix A using Golub-Kahn-Lanczos method.
IndexType c_lanczos_tridiagonalization(cLinearOperator< DataType > *A, const DataType *v, const LongIndexType n, const IndexType m, const DataType lanczos_tol, const FlagType orthogonalize, DataType *alpha, DataType *beta)
Tri-diagonalizes matrix A to T using the start vector v. is the Lanczos degree, which will be the siz...
static int eigh_tridiagonal(DataType *diagonals, DataType *subdiagonals, DataType *eigenvectors, IndexType matrix_size)
Computes all eigenvalues and eigenvectors of a real and symmetric tridiagonal matrix.
static int svd_bidiagonal(DataType *diagonals, DataType *subdiagonals, DataType *U, DataType *Vt, IndexType matrix_size)
Computes all singular-values and left and right eigenvectors of a real and symmetric upper bi-diagona...
static void generate_random_array(RandomNumberGenerator &random_number_generator, DataType *array, const LongIndexType array_size, const IndexType num_threads)
Generates a pseudo-random array with Rademacher distribution where elements are either +1 or -1.
virtual DataType get_eigenvalue(const DataType *known_parameters, const DataType known_eigenvalue, const DataType *inquiry_parameters) const =0
void set_parameters(DataType *parameters_)
Sets the scalar parameter this->parameters. Parameter is initialized to NULL. However,...
LongIndexType get_num_rows() const
FlagType is_eigenvalue_relation_known() const
Returns a flag that determines whether a relation between the parameters of the operator and its eige...
IndexType get_num_parameters() const
int IndexType
Definition: types.h:65

References c_golub_kahn_bidiagonalization(), c_lanczos_tridiagonalization(), Diagonalization< DataType >::eigh_tridiagonal(), RandomArrayGenerator< DataType >::generate_random_array(), cLinearOperator< DataType >::get_eigenvalue(), cLinearOperator< DataType >::get_num_parameters(), cLinearOperator< DataType >::get_num_rows(), cLinearOperator< DataType >::is_eigenvalue_relation_known(), cLinearOperator< DataType >::set_parameters(), and Diagonalization< DataType >::svd_bidiagonal().

Referenced by cTraceEstimator< DataType >::c_trace_estimator().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ c_trace_estimator()

template<typename DataType >
FlagType cTraceEstimator< DataType >::c_trace_estimator ( cLinearOperator< DataType > *  A,
DataType *  parameters,
const IndexType  num_inquiries,
const Function matrix_function,
const FlagType  gram,
const DataType  exponent,
const FlagType  orthogonalize,
const int64_t  seed,
const IndexType  lanczos_degree,
const DataType  lanczos_tol,
const IndexType  min_num_samples,
const IndexType  max_num_samples,
const DataType  error_atol,
const DataType  error_rtol,
const DataType  confidence_level,
const DataType  outlier_significance_level,
const IndexType  num_threads,
DataType *  trace,
DataType *  error,
DataType **  samples,
IndexType processed_samples_indices,
IndexType num_samples_used,
IndexType num_outliers,
FlagType converged,
float &  alg_wall_time 
)
static

Stochastic Lanczos quadrature method to estimate trace of a function of a linear operator. Both function and the linear operator can be defined with parameters.

Multiple batches of parameters of the linear operator can be passed to this function. In such a case, the output trace is an array of the of the number of the inquired parameters.

The stochastic estimator computes multiple samples of trace and the final result is the average of the samples. This function outputs both the samples of estimated trace values (in samples array) and their average (in trace array).

Parameters
[in]AAn instance of a class derived from LinearOperator class. This object will perform the matrix-vector operation and/or transposed matrix-vector operation for a linear operator. The linear operator can represent a fixed matrix, or a combination of matrices together with some given parameters.
[in]parametersThe parameters of the linear operator A. The size of this array is num_parameters*num_inquiries where num_parameters is the number of parameters that define the linear operator A, and num_inquiries is the number of different set of parameters to compute trace on different parametrized operators. The j-th set of parameters are stored in parameters[j*num_parameters:(j+1)*num_parameters]. That is, this array is contiguous over each batch of parameters.
[in]num_inquiriesThe number of batches of parameters. This function computes num_inquiries values of trace corresponding to different batch of parameters of the linear operator A. Hence, the number of output trace is num_inquiries. Hence, it is the number of columns of the output array samples.
[in]matrix_functionAn instance of Function class which has the function function. This function defines the matrix function, and operates on scalar eigenvalues of the matrix.
[in]gramFlag indicating whether the linear operator A is Gramian. If the linear operator is:
  • Gramian, then, Lanczos tridiagonalization method is employed. This method requires only matrix-vector multiplication.
  • not Gramian, then, Golub-Kahn bidiagonalization method is employed. This method requires both matrix and transposed-matrix vector multiplications.
[in]exponentThe exponent parameter p in the trace of the expression \( f((\mathbf{A} + t \mathbf{B})^p) \). The exponent is a real number and by default it is set to 1.0.
[in]orthogonalizeIndicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations.
  • If set to 0, no orthogonalization is performed.
  • If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization).
  • If set to a positive integer, say q less than lanczos_degree, the newly computed eigenvector is orthogonalized against the last q previous eigenvectors (partial reorthogonalization).
  • If set to an integer larger than lanczos_degree, it is cut to lanczos_degree, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[in]seedA non-negative integer to be used as seed to initiate the generation of sequences of peudo-random numbers in the algorithm. This is useful to make the result of the randomized algorithm to be reproducible. If a negative integer is given, the given seed value is ignored and the current processor time is used as the seed to initiate he generation random number sequences. In this case, the result is not reproducible, rather, is pseudo-random.
[in]lanczos_degreeThe number of Lanczos recursive iterations. The operator A is reduced to a square tridiagonal (or bidiagonal) matrix of the size lanczos_degree. The eigenvalues (or singular values) of this reduced matrix is computed and used in the stochastic Lanczos quadrature method. The larger Lanczos degree leads to a better estimation. The computational cost is quadratically increases with the lanczos degree.
[in]lanczos_tolThe tolerance to stop the Lanczos recursive iterations before the end of iterations reached. If the tolerance is not met, the iterations (total of lanczos_degree iterations) continue till end.
[in]min_num_samplesMinimum number of times that the trace estimation is repeated. Within the min number of samples, the Monte-Carlo continues even if convergence is reached.
[in]max_num_samplesThe number of times that the trace estimation is repeated. The output trace value is the average of the samples. Hence, this is the number of rows of the output array samples. Larger number of samples leads to a better trace estimation. The computational const linearly increases with number of samples.
[in]error_atolAbsolute tolerance criterion for early termination during the computation of trace samples. If the tolerance is not met, then all iterations (total of max_num_samples) proceed till end.
[in]error_rtolRelative tolerance criterion for early termination during the computation of trace samples. If the tolerance is not met, then all iterations (total of max_num_samples) proceed till end.
[in]confidence_levelThe confidence level of the error, which is a number between 0 and 1. This affects the scale of error.
[in]outlier_significance_levelOne minus the confidence level of the uncertainty band of the outlier. This is a number between 0 and 1. Confidence level of outleir and significance level of outlier are complement of each other.
[in]num_threadsNumber of OpenMP parallel processes. The parallelization is implemented over the Monte-Carlo iterations.
[out]traceThe output trace of size num_inquiries. These values are the average of the rows of samples array.
[out]errorThe error of estimation of trace, which is the standard deviation of the rows of samples array. The size of this array is num_inquiries.
[out]samples2D array of all estimated trace samples. The shape of this array is (max_num_samples*num_inquiries). The average of the rows is also given in trace array.
[out]processed_samples_indicesA 1D array indicating the processing order of rows of the samples. In parallel processing, this order of processing the rows of samples is not necessarly sequential.
[out]num_samples_used1D array of the size of the number of columns of samples. Each element indicates how many iterations were used till convergence is reached for each column of the samples. The number of iterations should be a number between min_num_samples and max_num_samples.
[out]num_outliers1D array with the size of number of columns of samples. Each element indicates how many rows of the samples array were outliers and were removed during averaging rows of samples.
[out]converged1D array of the size of the number of columns of samples. Each element indicates which column of samples has converged to the tolerance criteria. Normally, if the num_samples used is less than max_num_samples, it indicates that the convergence has reached.
[out]alg_wall_timeThe elapsed time that takes for the SLQ algorithm. This does not include array allocation/deallocation.
Returns
A signal to indicate the status of computation:
  • 1 indicates successful convergence within the given tolerances was met. Convergence is achieved when all elements of convergence array are below convergence_atol or convergence_rtol times trace.
  • 0 indicates the convergence criterion was not met for at least one of the trace inquiries.

Definition at line 192 of file c_trace_estimator.cpp.

218 {
219  // Matrix size
220  IndexType matrix_size = A->get_num_rows();
221 
222  // Set the number of threads
223  omp_set_num_threads(num_threads);
224 
225  // Allocate 1D array of random vectors We only allocate a random vector
226  // per parallel thread. Thus, the total size of the random vectors is
227  // matrix_size*num_threads. On each iteration in parallel threads, the
228  // allocated memory is reused. That is, in each iteration, a new random
229  // vector is generated for that specific thread id.
230  IndexType random_vectors_size = matrix_size * num_threads;
231  DataType* random_vectors = new DataType[random_vectors_size];
232 
233  // Initialize random number generator to generate in parallel threads
234  // independently.
235  RandomNumberGenerator random_number_generator(num_threads, seed);
236 
237  // The counter of filled size of processed_samples_indices array
238  // This scalar variable is defined as array to be shared among al threads
239  IndexType num_processed_samples = 0;
240 
241  // Criterion for early termination of iterations if convergence reached
242  // This scalar variable is defined as array to be shared among al threads
243  FlagType all_converged = 0;
244 
245  // Using square-root of max possible chunk size for parallel schedules
246  unsigned int chunk_size = static_cast<int>(
247  sqrt(static_cast<DataType>(max_num_samples) / num_threads));
248  if (chunk_size < 1)
249  {
250  chunk_size = 1;
251  }
252 
253  // Timing elapsed time of algorithm
254  Timer timer;
255  timer.start();
256 
257  // Shared-memory parallelism over Monte-Carlo ensemble sampling
258  IndexType i;
259  #pragma omp parallel for schedule(dynamic, chunk_size)
260  for (i=0; i < max_num_samples; ++i)
261  {
262  if (!static_cast<bool>(all_converged))
263  {
264  int thread_id = omp_get_thread_num();
265 
266  // Perform one Monte-Carlo sampling to estimate trace
268  A, parameters, num_inquiries, matrix_function, gram,
269  exponent, orthogonalize, lanczos_degree, lanczos_tol,
270  random_number_generator,
271  &random_vectors[matrix_size*thread_id], converged,
272  samples[i]);
273 
274  // Critical section
275  #pragma omp critical
276  {
277  // Store the index of processed samples
278  processed_samples_indices[num_processed_samples] = i;
279  ++num_processed_samples;
280 
281  // Check whether convergence criterion has been met to stop.
282  // This check can also be done after another parallel thread
283  // set all_converged to "1", but we continue to update error.
285  samples, min_num_samples, num_inquiries,
286  processed_samples_indices, num_processed_samples,
287  confidence_level, error_atol, error_rtol, error,
288  num_samples_used, converged);
289  }
290  }
291  }
292 
293  // Elapsed wall time of the algorithm (computation only, not array i/o)
294  timer.stop();
295  alg_wall_time = timer.elapsed();
296 
297  // Remove outliers from trace estimates and average trace estimates
299  confidence_level, outlier_significance_level, num_inquiries,
300  max_num_samples, num_samples_used, processed_samples_indices,
301  samples, num_outliers, trace, error);
302 
303  // Deallocate memory
304  delete[] random_vectors;
305 
306  return all_converged;
307 }
static FlagType check_convergence(DataType **samples, const IndexType min_num_samples, const IndexType num_inquiries, const IndexType *processed_samples_indices, const IndexType num_processed_samples, const DataType confidence_level, const DataType error_atol, const DataType error_rtol, DataType *error, IndexType *num_samples_used, FlagType *converged)
Checks if the standard deviation of the set of the cumulative averages of trace estimators converged ...
static void average_estimates(const DataType confidence_level, const DataType outlier_significance_level, const IndexType num_inquiries, const IndexType max_num_samples, const IndexType *num_samples_used, const IndexType *processed_samples_indices, DataType **samples, IndexType *num_outliers, DataType *trace, DataType *error)
Averages the estimates of trace. Removes outliers and reevaluates the error to take into account for ...
Generates 64-bit integers on multiple parallel threads.
Records elasped wall time between two events.
Definition: timer.h:50
void start()
Starts the timer.
Definition: timer.cpp:66
void stop()
Stops the timer.
Definition: timer.cpp:79
double elapsed() const
Returns the elapsed time in seconds.
Definition: timer.cpp:92
static void _c_stochastic_lanczos_quadrature(cLinearOperator< DataType > *A, DataType *parameters, const IndexType num_inquiries, const Function *matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const IndexType lanczos_degree, const DataType lanczos_tol, RandomNumberGenerator &random_number_generator, DataType *random_vector, FlagType *converged, DataType *trace_estimate)
For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quad...
int FlagType
Definition: types.h:68

References cTraceEstimator< DataType >::_c_stochastic_lanczos_quadrature(), ConvergenceTools< DataType >::average_estimates(), ConvergenceTools< DataType >::check_convergence(), Timer::elapsed(), cLinearOperator< DataType >::get_num_rows(), Timer::start(), and Timer::stop().

Here is the call graph for this function:

The documentation for this class was generated from the following files: