imate
C++/CUDA Reference
Loading...
Searching...
No Matches
cTraceEstimator< DataType > Class Template Reference

A static class to compute the trace of implicit matrix functions using stochastic Lanczos quadrature method. This class acts as a templated namespace, where the member methods is public and static. The internal private member functions are also static. More...

#include <c_trace_estimator.h>

Static Public Member Functions

static FlagType c_trace_estimator (cLinearOperator< DataType > *A, DataType *parameters, const IndexType num_inquiries, const Function *matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const int64_t seed, const IndexType lanczos_degree, const DataType lanczos_tol, const IndexType min_num_samples, const IndexType max_num_samples, const DataType error_atol, const DataType error_rtol, const DataType confidence_level, const DataType outlier_significance_level, const IndexType num_threads, DataType *trace, DataType *error, DataType **samples, IndexType *processed_samples_indices, IndexType *num_samples_used, IndexType *num_outliers, FlagType *converged, double &alg_wall_time)
 Stochastic Lanczos quadrature method to estimate trace of a function of a linear operator. Both function and the linear operator can be defined with parameters.
 

Static Private Member Functions

static void _c_stochastic_lanczos_quadrature (cLinearOperator< DataType > *A, DataType *parameters, const IndexType num_inquiries, const Function *matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const IndexType lanczos_degree, const DataType lanczos_tol, RandomNumberGenerator &random_number_generator, DataType *random_vector, FlagType *converged, DataType *trace_estimate)
 For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quadrature method.
 

Detailed Description

template<typename DataType>
class cTraceEstimator< DataType >

A static class to compute the trace of implicit matrix functions using stochastic Lanczos quadrature method. This class acts as a templated namespace, where the member methods is public and static. The internal private member functions are also static.

See also
Diagonalization

Definition at line 39 of file c_trace_estimator.h.

Member Function Documentation

◆ _c_stochastic_lanczos_quadrature()

template<typename DataType >
void cTraceEstimator< DataType >::_c_stochastic_lanczos_quadrature ( cLinearOperator< DataType > *  A,
DataType *  parameters,
const IndexType  num_inquiries,
const Function matrix_function,
const FlagType  gram,
const DataType  exponent,
const FlagType  orthogonalize,
const IndexType  lanczos_degree,
const DataType  lanczos_tol,
RandomNumberGenerator random_number_generator,
DataType *  random_vector,
FlagType converged,
DataType *  trace_estimate 
)
staticprivate

For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quadrature method.

Note
In special case when an eigenvalue relation is known, this function sets the converged inquiries to "not" converged in order to continue updating those inquiries. This is because in this special case, computing for other inquiries is free.
Parameters
[in]AAn instance of a class derived from LinearOperator class. This object will perform the matrix-vector operation and/or transposed matrix-vector operation for a linear operator. The linear operator can represent a fixed matrix, or a combination of matrices together with some given parameters.
[in]parametersThe parameters of the linear operator A. The size of this array is num_parameters*num_inquiries where num_parameters is the number of parameters that define the linear operator A, and num_inquiries is the number of different set of parameters to compute trace on different parametrized operators. The j-th set of parameters are stored in parameters[j*num_parameters:(j+1)*num_parameters]. That is, this array is contiguous over each batch of parameters.
[in]num_inquiriesThe number of batches of parameters. This function computes num_inquiries values of trace corresponding to different batch of parameters of the linear operator A. Hence, the number of output trace is num_inquiries. Hence, it is the number of columns of the output array samples.
[in]matrix_functionAn instance of Function class which has the function function. This function defines the matrix function, and operates on scalar eigenvalues of the matrix.
[in]gramFlag indicating whether the linear operator A is Gramian. If the linear operator is:
  • Gramian, then, Lanczos tridiagonalization method is employed. This method requires only matrix-vector multiplication.
  • not Gramian, then, Golub-Kahn bidiagonalization method is employed. This method requires both matrix and transposed-matrix vector multiplications.
[in]exponentThe exponent parameter p in the trace of the expression \( f((\mathbf{A} + t \mathbf{B})^p) \). The exponent is a real number and by default it is set to 1.0.
[in]orthogonalizeIndicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations.
  • If set to 0, no orthogonalization is performed.
  • If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization).
  • If set to a positive integer, say q less than lanczos_degree, the newly computed eigenvector is orthogonalized against the last q previous eigenvectors (partial reorthogonalization).
  • If set to an integer larger than lanczos_degree, it is cut to lanczos_degree, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[in]lanczos_degreeThe number of Lanczos recursive iterations. The operator A is reduced to a square tridiagonal (or bidiagonal) matrix of the size lanczos_degree. The eigenvalues (or singular values) of this reduced matrix is computed and used in the stochastic Lanczos quadrature method. The larger Lanczos degre leads to a better estimation. The computational cost is quadratically increases with the lanczos degree.
[in]lanczos_tolThe tolerance to stop the Lanczos recursive iterations before the end of iterations reached. If the tolerance is not met, the iterations (total of lanczos_degree iterations) continue till end.
[in]random_number_generatorGenerates random numbers that fills random_vector. In each parallel thread, an independent sequence of random numbers are generated. This object should be initialized by num_threads.
[in]random_vectorA 1D vector of the size of matrix A. The Lanczos iterations start off from this random vector. Each given random vector is used per a Monte-Carlo computation of the SLQ method. In the Lanczos iterations, other vectors are generated orthogonal to this initial random vector. This array is filled inside this function.
[out]converged1D array of the size of the number of columns of samples. Each element indicates which column of samples has converged to the tolerance criteria. Normally, if the num_samples used is less than max_num_samples, it indicates that the convergence has reached.
[out]trace_estimate1D array of the size of the number of columns of samples array. This array constitures each row of samples array. Each element of trace_estimates is the estimated trace for each parameter inquiry.

Definition at line 428 of file c_trace_estimator.cpp.

442{
443 // Matrix size
444 IndexType matrix_size = A->get_num_rows();
445
446 // Fill random vectors with Rademacher distribution (+1, -1), normalized
447 // but not orthogonalized. Setting num_threads to zero indicates to not
448 // create any new threads in RandomNumbrGenerator since the current
449 // function is inside a parallel thread.
450 IndexType num_threads = 0;
452 random_number_generator, random_vector, matrix_size, num_threads);
453
454 // Allocate diagonals (alpha) and supdiagonals (beta) of Lanczos matrix
455 DataType* alpha = new DataType[lanczos_degree];
456 DataType* beta = new DataType[lanczos_degree];
457
458 // Define 2D arrays needed to decomposition. All these arrays are
459 // defined as 1D array with Fortran ordering
460 DataType* eigenvectors = NULL;
461 DataType* left_singularvectors = NULL;
462 DataType* right_singularvectors_transposed = NULL;
463
464 // Actual number of inquiries
465 IndexType required_num_inquiries = num_inquiries;
467 {
468 // When a relation between eigenvalues and the parameters of the linear
469 // operator is known, to compute eigenvalues of for each inquiry, only
470 // computing one inquiry is enough. This is because an eigenvalue for
471 // one parameter setting is enough to compute eigenvalue of another set
472 // of parameters.
473 required_num_inquiries = 1;
474 }
475
476 // Allocate and initialize theta
477 IndexType i;
478 IndexType j;
479 DataType** theta = new DataType*[num_inquiries];
480 for (j=0; j < num_inquiries; ++j)
481 {
482 theta[j] = new DataType[lanczos_degree];
483
484 // Initialize components to zero
485 for (i=0; i < lanczos_degree; ++i)
486 {
487 theta[j][i] = 0.0;
488 }
489 }
490
491 // Allocate and initialize tau
492 DataType** tau = new DataType*[num_inquiries];
493 for (j=0; j < num_inquiries; ++j)
494 {
495 tau[j] = new DataType[lanczos_degree];
496
497 // Initialize components to zero
498 for (i=0; i < lanczos_degree; ++i)
499 {
500 tau[j][i] = 0.0;
501 }
502 }
503
504 // Allocate lanczos size for each inquiry. This variable keeps the non-zero
505 // size of the tri-diagonal (or bi-diagonal) matrix. Ideally, this matrix
506 // is of the size lanczos_degree. But, due to the early termination, this
507 // size might be smaller.
508 IndexType* lanczos_size = new IndexType[num_inquiries];
509
510 // Number of parameters of linear operator A
511 IndexType num_parameters = A->get_num_parameters();
512
513 // Lanczos iterations, computes theta and tau for each inquiry parameter
514 for (j=0; j < required_num_inquiries; ++j)
515 {
516 // If trace is already converged, do not compute on the new sample.
517 // However, exclude the case where required_num_inquiries is not the
518 // same as num_inquiries, since in this case, we compute one inquiry
519 // for multiple parameters.
520 if ((converged[j] == 1) && (required_num_inquiries == num_inquiries))
521 {
522 continue;
523 }
524
525 // Set parameter of linear operator A
526 A->set_parameters(&parameters[j*num_parameters]);
527
528 if (gram)
529 {
530 // Use Golub-Kahn-Lanczos Bi-diagonalization
531 lanczos_size[j] = c_golub_kahn_bidiagonalization(
532 A, random_vector, matrix_size, lanczos_degree, lanczos_tol,
533 orthogonalize, alpha, beta);
534
535 // Allocate matrix of singular vectors (1D array, Fortran ordering)
536 left_singularvectors = \
537 new DataType[lanczos_size[j] * lanczos_size[j]];
538 right_singularvectors_transposed = \
539 new DataType[lanczos_size[j] * lanczos_size[j]];
540
541 // Note: alpha is written in-place with singular values
543 alpha, beta, left_singularvectors,
544 right_singularvectors_transposed, lanczos_size[j]);
545
546 // theta and tau from singular values and vectors
547 for (i=0; i < lanczos_size[j]; ++i)
548 {
549 theta[j][i] = alpha[i] * alpha[i];
550 tau[j][i] = right_singularvectors_transposed[i];
551 }
552 }
553 else
554 {
555 // Use Lanczos Tri-diagonalization
556 lanczos_size[j] = c_lanczos_tridiagonalization(
557 A, random_vector, matrix_size, lanczos_degree, lanczos_tol,
558 orthogonalize, alpha, beta);
559
560 // Allocate eigenvectors matrix (1D array with Fortran ordering)
561 eigenvectors = new DataType[lanczos_size[j] * lanczos_size[j]];
562
563 // Note: alpha is written in-place with eigenvalues
565 alpha, beta, eigenvectors, lanczos_size[j]);
566
567 // theta and tau from singular values and vectors
568 for (i=0; i < lanczos_size[j]; ++i)
569 {
570 theta[j][i] = alpha[i];
571 tau[j][i] = eigenvectors[i * lanczos_size[j]];
572 }
573 }
574 }
575
576 // If an eigenvalue relation is known, compute the rest of eigenvalues
577 // using the eigenvalue relation given in the operator A for its
578 // eigenvalues. If no eigenvalue relation is not known, the rest of
579 // eigenvalues were already computed in the above loop and no other
580 // computation is needed.
581 if (A->is_eigenvalue_relation_known() && num_inquiries > 1)
582 {
583 // When the code execution reaches this function, at least one of the
584 // inquiries is not converged, but some others might have been
585 // converged already. Here, we force-update those that are even
586 // converged already by setting converged to false. The extra update is
587 // free of charge when a relation for the eigenvalues are known.
588 for (j=0; j < num_inquiries; ++j)
589 {
590 converged[j] = 0;
591 }
592
593 // Compute theta and tau for the rest of inquiry parameters
594 for (j=1; j < num_inquiries; ++j)
595 {
596 // Only j=0 was iterated before. Set the same size for other j-s
597 lanczos_size[j] = lanczos_size[0];
598
599 for (i=0; i < lanczos_size[j]; ++i)
600 {
601 // Shift eigenvalues by the old and new parameters
602 theta[j][i] = A->get_eigenvalue(
603 &parameters[0],
604 theta[0][i],
605 &parameters[j*num_parameters]);
606
607 // tau is the same (at least for the affine operator)
608 tau[j][i] = tau[0][i];
609 }
610 }
611 }
612
613 // Estimate trace using quadrature method
614 DataType quadrature_sum;
615 for (j=0; j < num_inquiries; ++j)
616 {
617 // If the j-th inquiry is already converged, skip.
618 if (converged[j] == 1)
619 {
620 continue;
621 }
622
623 // Initialize sum for the integral of quadrature
624 quadrature_sum = 0.0;
625
626 // Important: This loop should iterate till lanczos_size[j], but not
627 // lanczos_degree. Otherwise the computation is wrong for certain
628 // matrices, such as if the input matrix is identity, or rank
629 // deficient. By using lanczos_size[j] instead of lanczos_degree, all
630 // issues with special matrices will resolve.
631 for (i=0; i < lanczos_size[j]; ++i)
632 {
633 quadrature_sum += tau[j][i] * tau[j][i] * \
634 matrix_function->function(std::pow(theta[j][i], exponent));
635 }
636
637 trace_estimate[j] = \
638 static_cast<DataType>(matrix_size) * quadrature_sum;
639 }
640
641 // Release dynamic memory
642 delete[] alpha;
643 delete[] beta;
644 delete[] lanczos_size;
645
646 for (j=0; j < required_num_inquiries; ++j)
647 {
648 delete[] theta[j];
649 }
650 delete[] theta;
651
652 for (j=0; j < required_num_inquiries; ++j)
653 {
654 delete[] tau[j];
655 }
656 delete[] tau;
657
658 if (eigenvectors != NULL)
659 {
660 delete[] eigenvectors;
661 }
662
663 if (left_singularvectors != NULL)
664 {
665 delete[] left_singularvectors;
666 }
667
668 if (right_singularvectors_transposed != NULL)
669 {
670 delete[] right_singularvectors_transposed;
671 }
672}
IndexType c_golub_kahn_bidiagonalization(cLinearOperator< DataType > *A, const DataType *v, const LongIndexType n, const IndexType m, const DataType lanczos_tol, const FlagType orthogonalize, DataType *alpha, DataType *beta)
Bi-diagonalizes the positive-definite matrix A using Golub-Kahn-Lanczos method.
IndexType c_lanczos_tridiagonalization(cLinearOperator< DataType > *A, const DataType *v, const LongIndexType n, const IndexType m, const DataType lanczos_tol, const FlagType orthogonalize, DataType *alpha, DataType *beta)
Tri-diagonalizes matrix A to T using the start vector v. is the Lanczos degree, which will be the siz...
static int eigh_tridiagonal(DataType *diagonals, DataType *subdiagonals, DataType *eigenvectors, IndexType matrix_size)
Computes all eigenvalues and eigenvectors of a real and symmetric tridiagonal matrix.
static int svd_bidiagonal(DataType *diagonals, DataType *subdiagonals, DataType *U, DataType *Vt, IndexType matrix_size)
Computes all singular-values and left and right eigenvectors of a real and symmetric upper bi-diagona...
virtual float function(const float lambda_) const =0
static void generate_random_array(RandomNumberGenerator &random_number_generator, DataType *array, const LongIndexType array_size, const IndexType num_threads)
Generates a pseudo-random array with Rademacher distribution where elements are either +1 or -1.
IndexType get_num_parameters() const
Returns the number of parameters of the linear operator.
LongIndexType get_num_rows() const
Returns the number of rows of the matrix.
FlagType is_eigenvalue_relation_known() const
Returns a flag that determines whether a relation between the parameters of the operator and its eige...
virtual DataType get_eigenvalue(const DataType *known_parameters, const DataType known_eigenvalue, const DataType *inquiry_parameters) const =0
void set_parameters(DataType *parameters_)
Sets the scalar parameter this->parameters. Parameter is initialized to NULL. However,...
int IndexType
Definition types.h:65

References c_golub_kahn_bidiagonalization(), c_lanczos_tridiagonalization(), Diagonalization< DataType >::eigh_tridiagonal(), Function::function(), RandomArrayGenerator< DataType >::generate_random_array(), cLinearOperator< DataType >::get_eigenvalue(), cLinearOperatorBase::get_num_parameters(), cLinearOperatorBase::get_num_rows(), cLinearOperatorBase::is_eigenvalue_relation_known(), cLinearOperator< DataType >::set_parameters(), and Diagonalization< DataType >::svd_bidiagonal().

Referenced by cTraceEstimator< DataType >::c_trace_estimator().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ c_trace_estimator()

template<typename DataType >
FlagType cTraceEstimator< DataType >::c_trace_estimator ( cLinearOperator< DataType > *  A,
DataType *  parameters,
const IndexType  num_inquiries,
const Function matrix_function,
const FlagType  gram,
const DataType  exponent,
const FlagType  orthogonalize,
const int64_t  seed,
const IndexType  lanczos_degree,
const DataType  lanczos_tol,
const IndexType  min_num_samples,
const IndexType  max_num_samples,
const DataType  error_atol,
const DataType  error_rtol,
const DataType  confidence_level,
const DataType  outlier_significance_level,
const IndexType  num_threads,
DataType *  trace,
DataType *  error,
DataType **  samples,
IndexType processed_samples_indices,
IndexType num_samples_used,
IndexType num_outliers,
FlagType converged,
double &  alg_wall_time 
)
static

Stochastic Lanczos quadrature method to estimate trace of a function of a linear operator. Both function and the linear operator can be defined with parameters.

Multiple batches of parameters of the linear operator can be passed to this function. In such a case, the output trace is an array of the of the number of the inquired parameters.

The stochastic estimator computes multiple samples of trace and the final result is the average of the samples. This function outputs both the samples of estimated trace values (in samples array) and their average (in trace array).

Parameters
[in]AAn instance of a class derived from LinearOperator class. This object will perform the matrix-vector operation and/or transposed matrix-vector operation for a linear operator. The linear operator can represent a fixed matrix, or a combination of matrices together with some given parameters.
[in]parametersThe parameters of the linear operator A. The size of this array is num_parameters*num_inquiries where num_parameters is the number of parameters that define the linear operator A, and num_inquiries is the number of different set of parameters to compute trace on different parametrized operators. The j-th set of parameters are stored in parameters[j*num_parameters:(j+1)*num_parameters]. That is, this array is contiguous over each batch of parameters.
[in]num_inquiriesThe number of batches of parameters. This function computes num_inquiries values of trace corresponding to different batch of parameters of the linear operator A. Hence, the number of output trace is num_inquiries. Hence, it is the number of columns of the output array samples.
[in]matrix_functionAn instance of Function class which has the function function. This function defines the matrix function, and operates on scalar eigenvalues of the matrix.
[in]gramFlag indicating whether the linear operator A is Gramian. If the linear operator is:
  • Gramian, then, Lanczos tridiagonalization method is employed. This method requires only matrix-vector multiplication.
  • not Gramian, then, Golub-Kahn bidiagonalization method is employed. This method requires both matrix and transposed-matrix vector multiplications.
[in]exponentThe exponent parameter p in the trace of the expression \( f((\mathbf{A} + t \mathbf{B})^p) \). The exponent is a real number and by default it is set to 1.0.
[in]orthogonalizeIndicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations.
  • If set to 0, no orthogonalization is performed.
  • If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization).
  • If set to a positive integer, say q less than lanczos_degree, the newly computed eigenvector is orthogonalized against the last q previous eigenvectors (partial reorthogonalization).
  • If set to an integer larger than lanczos_degree, it is cut to lanczos_degree, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[in]seedA non-negative integer to be used as seed to initiate the generation of sequences of peudo-random numbers in the algorithm. This is useful to make the result of the randomized algorithm to be reproducible. If a negative integer is given, the given seed value is ignored and the current processor time is used as the seed to initiate he generation random number sequences. In this case, the result is not reproducible, rather, is pseudo-random.
[in]lanczos_degreeThe number of Lanczos recursive iterations. The operator A is reduced to a square tridiagonal (or bidiagonal) matrix of the size lanczos_degree. The eigenvalues (or singular values) of this reduced matrix is computed and used in the stochastic Lanczos quadrature method. The larger Lanczos degree leads to a better estimation. The computational cost is quadratically increases with the lanczos degree.
[in]lanczos_tolThe tolerance to stop the Lanczos recursive iterations before the end of iterations reached. If the tolerance is not met, the iterations (total of lanczos_degree iterations) continue till end.
[in]min_num_samplesMinimum number of times that the trace estimation is repeated. Within the min number of samples, the Monte-Carlo continues even if convergence is reached.
[in]max_num_samplesThe number of times that the trace estimation is repeated. The output trace value is the average of the samples. Hence, this is the number of rows of the output array samples. Larger number of samples leads to a better trace estimation. The computational const linearly increases with number of samples.
[in]error_atolAbsolute tolerance criterion for early termination during the computation of trace samples. If the tolerance is not met, then all iterations (total of max_num_samples) proceed till end.
[in]error_rtolRelative tolerance criterion for early termination during the computation of trace samples. If the tolerance is not met, then all iterations (total of max_num_samples) proceed till end.
[in]confidence_levelThe confidence level of the error, which is a number between 0 and 1. This affects the scale of error.
[in]outlier_significance_levelOne minus the confidence level of the uncertainty band of the outlier. This is a number between 0 and 1. Confidence level of outleir and significance level of outlier are complement of each other.
[in]num_threadsNumber of OpenMP parallel processes. The parallelization is implemented over the Monte-Carlo iterations.
[out]traceThe output trace of size num_inquiries. These values are the average of the rows of samples array.
[out]errorThe error of estimation of trace, which is the standard deviation of the rows of samples array. The size of this array is num_inquiries.
[out]samples2D array of all estimated trace samples. The shape of this array is (max_num_samples*num_inquiries). The average of the rows is also given in trace array.
[out]processed_samples_indicesA 1D array indicating the processing order of rows of the samples. In parallel processing, this order of processing the rows of samples is not necessarly sequential.
[out]num_samples_used1D array of the size of the number of columns of samples. Each element indicates how many iterations were used till convergence is reached for each column of the samples. The number of iterations should be a number between min_num_samples and max_num_samples.
[out]num_outliers1D array with the size of number of columns of samples. Each element indicates how many rows of the samples array were outliers and were removed during averaging rows of samples.
[out]converged1D array of the size of the number of columns of samples. Each element indicates which column of samples has converged to the tolerance criteria. Normally, if the num_samples used is less than max_num_samples, it indicates that the convergence has reached.
[out]alg_wall_timeThe elapsed time that takes for the SLQ algorithm. This does not include array allocation/deallocation.
Returns
A signal to indicate the status of computation:
  • 1 indicates successful convergence within the given tolerances was met. Convergence is achieved when all elements of convergence array are below convergence_atol or convergence_rtol times trace.
  • 0 indicates the convergence criterion was not met for at least one of the trace inquiries.

Definition at line 195 of file c_trace_estimator.cpp.

221{
222 // Matrix size
223 IndexType matrix_size = A->get_num_rows();
224
225 // Set the number of threads
226 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
227 omp_set_num_threads(num_threads);
228 #endif
229
230 // Allocate 1D array of random vectors We only allocate a random vector
231 // per parallel thread. Thus, the total size of the random vectors is
232 // matrix_size*num_threads. On each iteration in parallel threads, the
233 // allocated memory is reused. That is, in each iteration, a new random
234 // vector is generated for that specific thread id.
235 IndexType random_vectors_size = matrix_size * num_threads;
236 DataType* random_vectors = new DataType[random_vectors_size];
237
238 // Initialize random number generator to generate in parallel threads
239 // independently.
240 RandomNumberGenerator random_number_generator(num_threads, seed);
241
242 // The counter of filled size of processed_samples_indices array
243 // This scalar variable is defined as array to be shared among al threads
244 IndexType num_processed_samples = 0;
245
246 // Criterion for early termination of iterations if convergence reached
247 // This scalar variable is defined as array to be shared among al threads
248 FlagType all_converged = 0;
249
250 // Using square-root of max possible chunk size for parallel schedules
251 unsigned int chunk_size = static_cast<int>(
252 std::sqrt(static_cast<DataType>(max_num_samples) / \
253 static_cast<DataType>(num_threads)));
254 if (chunk_size < 1)
255 {
256 chunk_size = 1;
257 }
258
259 // Timing elapsed time of algorithm
260 Timer timer;
261 timer.start();
262
263 // Shared-memory parallelism over Monte-Carlo ensemble sampling
264 IndexType i;
265 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
266 #pragma omp parallel for \
267 schedule(dynamic, chunk_size) \
268 if (max_num_samples >= num_threads)
269 #endif
270 for (i=0; i < max_num_samples; ++i)
271 {
272 if (!static_cast<bool>(all_converged))
273 {
274 int thread_id;
275 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
276 thread_id = omp_get_thread_num();
277 #else
278 thread_id = 0;
279 #endif
280
281 // Perform one Monte-Carlo sampling to estimate trace
283 A, parameters, num_inquiries, matrix_function, gram,
284 exponent, orthogonalize, lanczos_degree, lanczos_tol,
285 random_number_generator,
286 &random_vectors[matrix_size*thread_id], converged,
287 samples[i]);
288
289 // Critical section
290 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
291 #pragma omp critical
292 #endif
293 {
294 // Store the index of processed samples
295 processed_samples_indices[num_processed_samples] = i;
296 ++num_processed_samples;
297
298 // Check whether convergence criterion has been met to stop.
299 // This check can also be done after another parallel thread
300 // set all_converged to "1", but we continue to update error.
302 samples, min_num_samples, num_inquiries,
303 processed_samples_indices, num_processed_samples,
304 confidence_level, error_atol, error_rtol, error,
305 num_samples_used, converged);
306 }
307 }
308 }
309
310 // Elapsed wall time of the algorithm (computation only, not array i/o)
311 timer.stop();
312 alg_wall_time = timer.elapsed();
313
314 // Remove outliers from trace estimates and average trace estimates
316 confidence_level, outlier_significance_level, num_inquiries,
317 max_num_samples, num_samples_used, processed_samples_indices,
318 samples, num_outliers, trace, error);
319
320 // Deallocate memory
321 delete[] random_vectors;
322
323 return all_converged;
324}
static FlagType check_convergence(DataType **samples, const IndexType min_num_samples, const IndexType num_inquiries, const IndexType *processed_samples_indices, const IndexType num_processed_samples, const DataType confidence_level, const DataType error_atol, const DataType error_rtol, DataType *error, IndexType *num_samples_used, FlagType *converged)
Checks if the standard deviation of the set of the cumulative averages of trace estimators converged ...
static void average_estimates(const DataType confidence_level, const DataType outlier_significance_level, const IndexType num_inquiries, const IndexType max_num_samples, const IndexType *num_samples_used, const IndexType *processed_samples_indices, DataType **samples, IndexType *num_outliers, DataType *trace, DataType *error)
Averages the estimates of trace. Removes outliers and reevaluates the error to take into account for ...
Generates 64-bit integers on multiple parallel threads.
Records elasped wall time between two events.
Definition timer.h:50
void start()
Starts the timer.
Definition timer.cpp:66
void stop()
Stops the timer.
Definition timer.cpp:79
double elapsed() const
Returns the elapsed time in seconds.
Definition timer.cpp:92
static void _c_stochastic_lanczos_quadrature(cLinearOperator< DataType > *A, DataType *parameters, const IndexType num_inquiries, const Function *matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const IndexType lanczos_degree, const DataType lanczos_tol, RandomNumberGenerator &random_number_generator, DataType *random_vector, FlagType *converged, DataType *trace_estimate)
For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quad...
void omp_set_num_threads(int num_threads)
int omp_get_thread_num()
int FlagType
Definition types.h:68

References cTraceEstimator< DataType >::_c_stochastic_lanczos_quadrature(), ConvergenceTools< DataType >::average_estimates(), ConvergenceTools< DataType >::check_convergence(), Timer::elapsed(), cLinearOperatorBase::get_num_rows(), omp_get_thread_num(), omp_set_num_threads(), Timer::start(), and Timer::stop().

Here is the call graph for this function:

The documentation for this class was generated from the following files: