A static class to compute the trace of implicit matrix functions using stochastic Lanczos quadrature method. This class acts as a templated namespace, where the member methods is public and static. The internal private member functions are also static. More...

#include <cu_trace_estimator.h>

Static Public Member Functions
static FlagType	cu_trace_estimator (cuLinearOperator< DataType > A, DataType parameters, const IndexType num_inquiries, const Function matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const int64_t seed, const IndexType lanczos_degree, const DataType lanczos_tol, const IndexType min_num_samples, const IndexType max_num_samples, const DataType error_atol, const DataType error_rtol, const DataType confidence_level, const DataType outlier_significance_level, const IndexType num_threads, const IndexType num_gpu_devices, DataType trace, DataType error, DataType samples, IndexType processed_samples_indices, IndexType num_samples_used, IndexType num_outliers, FlagType *converged, float &alg_wall_time)
	Stochastic Lanczos quadrature method to estimate trace of a function of a linear operator. Both function and the linear operator can be defined with parameters. More...

Static Private Member Functions
static void	_cu_stochastic_lanczos_quadrature (cuLinearOperator< DataType > A, DataType parameters, const IndexType num_inquiries, const Function matrix_function, const FlagType gram, const DataType exponent, const FlagType orthogonalize, const IndexType lanczos_degree, const DataType lanczos_tol, RandomNumberGenerator &random_number_generator, DataType random_vector, FlagType converged, DataType trace_estimate)
	For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quadrature method. More...

Detailed Description

template<typename DataType>
class cuTraceEstimator< DataType >

A static class to compute the trace of implicit matrix functions using stochastic Lanczos quadrature method. This class acts as a templated namespace, where the member methods is public and static. The internal private member functions are also static.

See also: Diagonalization

Definition at line 39 of file cu_trace_estimator.h.

Member Function Documentation

◆ _cu_stochastic_lanczos_quadrature()

template<typename DataType >

void cuTraceEstimator< DataType >::_cu_stochastic_lanczos_quadrature	(	cuLinearOperator< DataType > *	A,
		DataType *	parameters,
		const IndexType	num_inquiries,
		const Function *	matrix_function,
		const FlagType	gram,
		const DataType	exponent,
		const FlagType	orthogonalize,
		const IndexType	lanczos_degree,
		const DataType	lanczos_tol,
		RandomNumberGenerator &	random_number_generator,
		DataType *	random_vector,
		FlagType *	converged,
		DataType *	trace_estimate
	)

staticprivate

For a given random input vector, computes one Monte-Carlo sample to estimate trace using Lanczos quadrature method.

Note: In special case when an eigenvalue relation is known, this function sets the converged inquiries to "not" converged in order to continue updating those inquiries. This is because in this special case, computing for other inquiries is free.

Parameters

[in]	A	An instance of a class derived from `LinearOperator` class. This object will perform the matrix-vector operation and/or transposed matrix-vector operation for a linear operator. The linear operator can represent a fixed matrix, or a combination of matrices together with some given parameters.
[in]	parameters	The parameters of the linear operator `A`. The size of this array is `num_parametersnum_inquiries` where `num_parameters` is the number of parameters that define the linear operator `A`, and `num_inquiries` is the number of different set of parameters to compute trace on different parametrized operators. The j-th set of parameters are stored in `parameters`[jnum_parameters:(j+1)*num_parameters]. That is, this array is contiguous over each batch of parameters.
[in]	num_inquiries	The number of batches of parameters. This function computes `num_inquiries` values of trace corresponding to different batch of parameters of the linear operator `A`. Hence, the number of output trace is `num_inquiries`. Hence, it is the number of columns of the output array `samples`.
[in]	matrix_function	An instance of `Function` class which has the function `function`. This function defines the matrix function, and operates on scalar eigenvalues of the matrix.
[in]	gram	Flag indicating whether the linear operator `A` is Gramian. If the linear operator is: Gramian, then, Lanczos tridiagonalization method is employed. This method requires only matrix-vector multiplication. not Gramian, then, Golub-Kahn bidiagonalization method is employed. This method requires both matrix and transposed-matrix vector multiplications.
[in]	exponent	The exponent parameter `p` in the trace of the expression \( f((\mathbf{A} + t \mathbf{B})^p) \). The exponent is a real number and by default it is set to `1.0`.
[in]	orthogonalize	Indicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations. If set to `0`, no orthogonalization is performed. If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization). If set to a positive integer, say `q` less than `lanczos_degree`, the newly computed eigenvector is orthogonalized against the last `q` previous eigenvectors (partial reorthogonalization). If set to an integer larger than `lanczos_degree`, it is cut to `lanczos_degree`, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[in]	lanczos_degree	The number of Lanczos recursive iterations. The operator `A` is reduced to a square tridiagonal (or bidiagonal) matrix of the size `lanczos_degree`. The eigenvalues (or singular values) of this reduced matrix is computed and used in the stochastic Lanczos quadrature method. The larger Lanczos degre leads to a better estimation. The computational cost is quadratically increases with the lanczos degree.
[in]	lanczos_tol	The tolerance to stop the Lanczos recursive iterations before the end of iterations reached. If the tolerance is not met, the iterations (total of `lanczos_degree` iterations) continue till end.
[in]	random_number_generator	Generates random numbers that fills `random_vector`. In each parallel thread, an independent sequence of random numbers are generated. This object should be initialized by `num_threads`.
[in]	random_vector	A 1D vector of the size of matrix `A`. The Lanczos iterations start off from this random vector. Each given random vector is used per a Monte-Carlo computation of the SLQ method. In the Lanczos iterations, other vectors are generated orthogonal to this initial random vector. This array is filled inside this function.
[out]	converged	1D array of the size of the number of columns of `samples`. Each element indicates which column of `samples` has converged to the tolerance criteria. Normally, if the `num_samples` used is less than `max_num_samples`, it indicates that the convergence has reached.
[out]	trace_estimate	1D array of the size of the number of columns of `samples` array. This array constitures each row of `samples` array. Each element of `trace_estimates` is the estimated trace for each parameter inquiry.

Definition at line 419 of file cu_trace_estimator.cu.

 {
     // Matrix size
     IndexType matrix_size = A->get_num_rows();
  
     // Fill random vectors with Rademacher distribution (+1, -1), normalized
     // but not orthogonalized. Setting num_threads to zero indicates to not
     // create any new threads in RandomNumbrGenerator since the current
     // function is inside a parallel thread.
     IndexType num_threads = 0;
     RandomArrayGenerator<DataType>::generate_random_array(
             random_number_generator, random_vector, matrix_size, num_threads);
  
     // Allocate diagonals (alpha) and supdiagonals (beta) of Lanczos matrix
     DataType* alpha = new DataType[lanczos_degree];
     DataType* beta = new DataType[lanczos_degree];
  
     // Define 2D arrays needed to decomposition. All these arrays are
     // defined as 1D array with Fortran ordering
     DataType* eigenvectors = NULL;
     DataType* left_singularvectors = NULL;
     DataType* right_singularvectors_transposed = NULL;
  
     // Actual number of inquiries
     IndexType required_num_inquiries = num_inquiries;
     if (A->is_eigenvalue_relation_known())
     {
         // When a relation between eigenvalues and the parameters of the linear
         // operator is known, to compute eigenvalues of for each inquiry, only
         // computing one inquiry is enough. This is because an eigenvalue for
         // one parameter setting is enough to compute eigenvalue of another set
         // of parameters.
         required_num_inquiries = 1;
     }
  
     // Allocate and initialize theta
     IndexType i;
     IndexType j;
     DataType** theta = new DataType*[num_inquiries];
     for (j=0; j < num_inquiries; ++j)
     {
         theta[j] = new DataType[lanczos_degree];
  
         // Initialize components to zero
         for (i=0; i < lanczos_degree; ++i)
         {
             theta[j][i] = 0.0;
         }
     }
  
     // Allocate and initialize tau
     DataType** tau = new DataType*[num_inquiries];
     for (j=0; j < num_inquiries; ++j)
     {
         tau[j] = new DataType[lanczos_degree];
  
         // Initialize components to zero
         for (i=0; i < lanczos_degree; ++i)
         {
             tau[j][i] = 0.0;
         }
     }
  
     // Allocate lanczos size for each inquiry. This variable keeps the non-zero
     // size of the tri-diagonal (or bi-diagonal) matrix. Ideally, this matrix
     // is of the size lanczos_degree. But, due to the early termination, this
     // size might be smaller.
     IndexType* lanczos_size = new IndexType[num_inquiries];
  
     // Number of parameters of linear operator A
     IndexType num_parameters = A->get_num_parameters();
  
     // Lanczos iterations, computes theta and tau for each inquiry parameter
     for (j=0; j < required_num_inquiries; ++j)
     {
         // If trace is already converged, do not compute on the new sample.
         // However, exclude the case where required_num_inquiries is not the
         // same as num_inquiries, since in this case, we compute one inquiry
         // for multiple parameters.
         if ((converged[j] == 1) && (required_num_inquiries == num_inquiries))
         {
             continue;
         }
  
         // Set parameter of linear operator A
         A->set_parameters(&parameters[j*num_parameters]);
  
         if (gram)
         {
             // Use Golub-Kahn-Lanczos Bi-diagonalization
             lanczos_size[j] = cu_golub_kahn_bidiagonalization(
                     A, random_vector, matrix_size, lanczos_degree, lanczos_tol,
                     orthogonalize, alpha, beta);
  
             // Allocate matrix of singular vectors (1D array, Fortran ordering)
             left_singularvectors = \
                 new DataType[lanczos_size[j] * lanczos_size[j]];
             right_singularvectors_transposed = \
                 new DataType[lanczos_size[j] * lanczos_size[j]];
  
             // Note: alpha is written in-place with singular values
             Diagonalization<DataType>::svd_bidiagonal(
                     alpha, beta, left_singularvectors,
                     right_singularvectors_transposed, lanczos_size[j]);
  
             // theta and tau from singular values and vectors
             for (i=0; i < lanczos_size[j]; ++i)
             {
                 theta[j][i] = alpha[i] * alpha[i];
                 tau[j][i] = right_singularvectors_transposed[i];
             }
         }
         else
         {
             // Use Lanczos Tri-diagonalization
             lanczos_size[j] = cu_lanczos_tridiagonalization(
                     A, random_vector, matrix_size, lanczos_degree, lanczos_tol,
                     orthogonalize, alpha, beta);
  
             // Allocate eigenvectors matrix (1D array with Fortran ordering)
             eigenvectors = new DataType[lanczos_size[j] * lanczos_size[j]];
  
             // Note: alpha is written in-place with eigenvalues
             Diagonalization<DataType>::eigh_tridiagonal(
                     alpha, beta, eigenvectors, lanczos_size[j]);
  
             // theta and tau from singular values and vectors
             for (i=0; i < lanczos_size[j]; ++i)
             {
                 theta[j][i] = alpha[i];
                 tau[j][i] = eigenvectors[i * lanczos_size[j]];
             }
         }
     }
  
     // If an eigenvalue relation is known, compute the rest of eigenvalues
     // using the eigenvalue relation given in the operator A for its
     // eigenvalues. If no eigenvalue relation is not known, the rest of
     // eigenvalues were already computed in the above loop and no other
     // computation is needed.
     if (A->is_eigenvalue_relation_known() && num_inquiries > 1)
     {
         // When the code execution reaches this function, at least one of the
         // inquiries is not converged, but some others might have been
         // converged already. Here, we force-update those that are even
         // converged already by setting converged to false. The extra update is
         // free of charge when a relation for the eigenvalues are known.
         for (j=0; j < num_inquiries; ++j)
         {
             converged[j] = 0;
         }
  
         // Compute theta and tau for the rest of inquiry parameters
         for (j=1; j < num_inquiries; ++j)
         {
             // Only j=0 was iterated before. Set the same size for other j-s
             lanczos_size[j] = lanczos_size[0];
  
             for (i=0; i < lanczos_size[j]; ++i)
             {
                 // Shift eigenvalues by the old and new parameters
                 theta[j][i] = A->get_eigenvalue(
                         &parameters[0],
                         theta[0][i],
                         &parameters[j*num_parameters]);
  
                 // tau is the same (at least for the affine operator)
                 tau[j][i] = tau[0][i];
             }
         }
     }
  
     // Estimate trace using quadrature method
     DataType quadrature_sum;
     for (j=0; j < num_inquiries; ++j)
     {
         // If the j-th inquiry is already converged, skip.
         if (converged[j] == 1)
         {
             continue;
         }
  
         // Initialize sum for the integral of quadrature
         quadrature_sum = 0.0;
  
         // Important: This loop should iterate till lanczos_size[j], but not
         // lanczos_degree. Otherwise the computation is wrong for certain
         // matrices, such as if the input matrix is identity, or rank
         // deficient. By using lanczos_size[j] instead of lanczos_degree, all
         // issues with special matrices will resolve.
         for (i=0; i < lanczos_size[j]; ++i)
         {
             quadrature_sum += tau[j][i] * tau[j][i] * \
                     matrix_function->function(pow(theta[j][i], exponent));
         }
  
         trace_estimate[j] = matrix_size * quadrature_sum;
     }
  
     // Release dynamic memory
     delete[] alpha;
     delete[] beta;
     delete[] lanczos_size;
  
     for (j=0; j < required_num_inquiries; ++j)
     {
         delete[] theta[j];
     }
     delete[] theta;
  
     for (j=0; j < required_num_inquiries; ++j)
     {
         delete[] tau[j];
     }
     delete[] tau;
  
     if (eigenvectors != NULL)
     {
         delete[] eigenvectors;
     }
  
     if (left_singularvectors != NULL)
     {
         delete[] left_singularvectors;
     }
  
     if (right_singularvectors_transposed != NULL)
     {
         delete[] right_singularvectors_transposed;
     }
 }

References cu_golub_kahn_bidiagonalization(), cu_lanczos_tridiagonalization(), Diagonalization< DataType >::eigh_tridiagonal(), RandomArrayGenerator< DataType >::generate_random_array(), cLinearOperator< DataType >::get_eigenvalue(), cLinearOperator< DataType >::get_num_parameters(), cLinearOperator< DataType >::get_num_rows(), cLinearOperator< DataType >::is_eigenvalue_relation_known(), cLinearOperator< DataType >::set_parameters(), and Diagonalization< DataType >::svd_bidiagonal().

Referenced by cuTraceEstimator< DataType >::cu_trace_estimator().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ cu_trace_estimator()

template<typename DataType >

FlagType cuTraceEstimator< DataType >::cu_trace_estimator	(	cuLinearOperator< DataType > *	A,
		DataType *	parameters,
		const IndexType	num_inquiries,
		const Function *	matrix_function,
		const FlagType	gram,
		const DataType	exponent,
		const FlagType	orthogonalize,
		const int64_t	seed,
		const IndexType	lanczos_degree,
		const DataType	lanczos_tol,
		const IndexType	min_num_samples,
		const IndexType	max_num_samples,
		const DataType	error_atol,
		const DataType	error_rtol,
		const DataType	confidence_level,
		const DataType	outlier_significance_level,
		const IndexType	num_threads,
		const IndexType	num_gpu_devices,
		DataType *	trace,
		DataType *	error,
		DataType **	samples,
		IndexType *	processed_samples_indices,
		IndexType *	num_samples_used,
		IndexType *	num_outliers,
		FlagType *	converged,
		float &	alg_wall_time
	)

static

Stochastic Lanczos quadrature method to estimate trace of a function of a linear operator. Both function and the linear operator can be defined with parameters.

Multiple batches of parameters of the linear operator can be passed to this function. In such a case, the output trace is an array of the of the number of the inquired parameters.

The stochastic estimator computes multiple samples of trace and the final result is the average of the samples. This function outputs both the samples of estimated trace values (in samples array) and their average (in trace array).

Parameters

[in]	A	An instance of a class derived from `LinearOperator` class. This object will perform the matrix-vector operation and/or transposed matrix-vector operation for a linear operator. The linear operator can represent a fixed matrix, or a combination of matrices together with some given parameters.
[in]	parameters	The parameters of the linear operator `A`. The size of this array is `num_parametersnum_inquiries` where `num_parameters` is the number of parameters that define the linear operator `A`, and `num_inquiries` is the number of different set of parameters to compute trace on different parametrized operators. The j-th set of parameters are stored in `parameters`[jnum_parameters:(j+1)*num_parameters]. That is, this array is contiguous over each batch of parameters.
[in]	num_inquiries	The number of batches of parameters. This function computes `num_inquiries` values of trace corresponding to different batch of parameters of the linear operator `A`. Hence, the number of output trace is `num_inquiries`. Hence, it is the number of columns of the output array `samples`.
[in]	matrix_function	An instance of `Function` class which has the function `function`. This function defines the matrix function, and operates on scalar eigenvalues of the matrix.
[in]	gram	Flag indicating whether the linear operator `A` is Gramian. If the linear operator is: Gramian, then, Lanczos tridiagonalization method is employed. This method requires only matrix-vector multiplication. not Gramian, then, Golub-Kahn bidiagonalization method is employed. This method requires both matrix and transposed-matrix vector multiplications.
[in]	exponent	The exponent parameter `p` in the trace of the expression \( f((\mathbf{A} + t \mathbf{B})^p) \). The exponent is a real number and by default it is set to `1.0`.
[in]	orthogonalize	Indicates whether to orthogonalize the orthogonal eigenvectors during Lanczos recursive iterations. If set to `0`, no orthogonalization is performed. If set to a negative integer, a newly computed eigenvector is orthogonalized against all the previous eigenvectors (full reorthogonalization). If set to a positive integer, say `q` less than `lanczos_degree`, the newly computed eigenvector is orthogonalized against the last `q` previous eigenvectors (partial reorthogonalization). If set to an integer larger than `lanczos_degree`, it is cut to `lanczos_degree`, which effectively orthogonalizes against all previous eigenvectors (full reorthogonalization).
[in]	seed	A non-negative integer to be used as seed to initiate the generation of sequences of peudo-random numbers in the algorithm. This is useful to make the result of the randomized algorithm to be reproducible. If a negative integer is given, the given seed value is ignored and the current processor time is used as the seed to initiate he generation random number sequences. In this case, the result is not reproducible, rather, is pseudo-random.
[in]	lanczos_degree	The number of Lanczos recursive iterations. The operator `A` is reduced to a square tridiagonal (or bidiagonal) matrix of the size `lanczos_degree`. The eigenvalues (or singular values) of this reduced matrix is computed and used in the stochastic Lanczos quadrature method. The larger Lanczos degree leads to a better estimation. The computational cost is quadratically increases with the lanczos degree.
[in]	lanczos_tol	The tolerance to stop the Lanczos recursive iterations before the end of iterations reached. If the tolerance is not met, the iterations (total of `lanczos_degree` iterations) continue till end.
[in]	min_num_samples	Minimum number of times that the trace estimation is repeated. Within the min number of samples, the Monte-Carlo continues even if convergence is reached.
[in]	max_num_samples	The number of times that the trace estimation is repeated. The output trace value is the average of the samples. Hence, this is the number of rows of the output array `samples`. Larger number of samples leads to a better trace estimation. The computational const linearly increases with number of samples.
[in]	error_atol	Absolute tolerance criterion for early termination during the computation of trace samples. If the tolerance is not met, then all iterations (total of `max_num_samples`) proceed till end.
[in]	error_rtol	Relative tolerance criterion for early termination during the computation of trace samples. If the tolerance is not met, then all iterations (total of `max_num_samples`) proceed till end.
[in]	confidence_level	The confidence level of the error, which is a number between `0` and `1`. This affects the scale of `error`.
[in]	outlier_significance_level	One minus the confidence level of the uncertainty band of the outlier. This is a number between `0` and `1`. Confidence level of outleir and significance level of outlier are complement of each other.
[in]	num_threads	Number of OpenMP parallel processes. The parallelization is implemented over the Monte-Carlo iterations.
[in]	num_gpu_devices	Number of GPU devices to use. This is the number of CPU threads to be created to handle each GPU device in parallel for each CPU thread.
[out]	trace	The output trace of size `num_inquiries`. These values are the average of the rows of `samples` array.
[out]	error	The error of estimation of trace, which is the standard deviation of the rows of `samples` array. The size of this array is `num_inquiries`.
[out]	samples	2D array of all estimated trace samples. The shape of this array is (max_num_samples*num_inquiries). The average of the rows is also given in `trace` array.
[out]	processed_samples_indices	A 1D array indicating the processing order of rows of the `samples`. In parallel processing, this order of processing the rows of `samples` is not necessarly sequential.
[out]	num_samples_used	1D array of the size of the number of columns of `samples`. Each element indicates how many iterations were used till convergence is reached for each column of the `samples`. The number of iterations should be a number between `min_num_samples` and `max_num_samples`.
[out]	num_outliers	1D array with the size of number of columns of `samples`. Each element indicates how many rows of the `samples` array were outliers and were removed during averaging rows of `samples`.
[out]	converged	1D array of the size of the number of columns of `samples`. Each element indicates which column of `samples` has converged to the tolerance criteria. Normally, if the `num_samples` used is less than `max_num_samples`, it indicates that the convergence has reached.
[out]	alg_wall_time	The elapsed time that takes for the SLQ algorithm. This does not include array allocation/deallocation.

Returns

A signal to indicate the status of computation:

1 indicates successful convergence within the given tolerances was met. Convergence is achieved when all elements of convergence array are below convergence_atol or convergence_rtol times trace.
0 indicates the convergence criterion was not met for at least one of the trace inquiries.

Definition at line 197 of file cu_trace_estimator.cu.

 {
     // Matrix size
     IndexType matrix_size = A->get_num_rows();
  
     // Set the number of threads
     omp_set_num_threads(num_gpu_devices);
  
     // Allocate 1D array of random vectors We only allocate a random vector
     // per parallel thread. Thus, the total size of the random vectors is
     // matrix_size*num_threads. On each iteration in parallel threads, the
     // allocated memory is reused. That is, in each iteration, a new random
     // vector is generated for that specific thread id.
     IndexType random_vectors_size = matrix_size * num_gpu_devices;
     DataType* random_vectors = new DataType[random_vectors_size];
  
     // Initialize random number generator to generate in parallel threads
     // independently.
     RandomNumberGenerator random_number_generator(num_gpu_devices, seed);
  
     // The counter of filled size of processed_samples_indices array
     // This scalar variable is defined as array to be shared among al threads
     IndexType num_processed_samples = 0;
  
     // Criterion for early termination of iterations if convergence reached
     // This scalar variable is defined as array to be shared among al threads
     FlagType all_converged = 0;
  
     // Using square-root of max possible chunk size for parallel schedules
     unsigned int chunk_size = static_cast<int>(
             sqrt(static_cast<DataType>(max_num_samples) / num_gpu_devices));
     if (chunk_size < 1)
     {
         chunk_size = 1;
     }
  
     // Timing elapsed time of algorithm
     CudaTimer cuda_timer;
     cuda_timer.start();
  
     // Shared-memory parallelism over Monte-Carlo ensemble sampling
     IndexType i;
     #pragma omp parallel for schedule(dynamic, chunk_size)
     for (i=0; i < max_num_samples; ++i)
     {
         if (!static_cast<bool>(all_converged))
         {
             // Switch to a device with the same device id as the cpu thread id
             unsigned int thread_id = omp_get_thread_num();
             CudaInterface<DataType>::set_device(thread_id);
  
             // Perform one Monte-Carlo sampling to estimate trace
             cuTraceEstimator<DataType>::_cu_stochastic_lanczos_quadrature(
                     A, parameters, num_inquiries, matrix_function, gram,
                     exponent, orthogonalize, lanczos_degree, lanczos_tol,
                     random_number_generator,
                     &random_vectors[matrix_size*thread_id], converged,
                     samples[i]);
  
             // Critical section
             #pragma omp critical
             {
                 // Store the index of processed samples
                 processed_samples_indices[num_processed_samples] = i;
                 ++num_processed_samples;
  
                 // Check whether convergence criterion has been met to stop.
                 // This check can also be done after another parallel thread
                 // set all_converged to "1", but we continue to update error.
                 all_converged = ConvergenceTools<DataType>::check_convergence(
                         samples, min_num_samples, num_inquiries,
                         processed_samples_indices, num_processed_samples,
                         confidence_level, error_atol, error_rtol, error,
                         num_samples_used, converged);
             }
         }
     }
  
     // Elapsed wall time of the algorithm (computation only, not array i/o)
     cuda_timer.stop();
     alg_wall_time = cuda_timer.elapsed();
  
     // Remove outliers from trace estimates and average trace estimates
     ConvergenceTools<DataType>::average_estimates(
             confidence_level, outlier_significance_level, num_inquiries,
             max_num_samples, num_samples_used, processed_samples_indices,
             samples, num_outliers, trace, error);
  
     // Deallocate memory
     delete[] random_vectors;
  
     return all_converged;
 }

References cuTraceEstimator< DataType >::_cu_stochastic_lanczos_quadrature(), ConvergenceTools< DataType >::average_estimates(), ConvergenceTools< DataType >::check_convergence(), CudaTimer::elapsed(), cLinearOperator< DataType >::get_num_rows(), CudaInterface< ArrayType >::set_device(), CudaTimer::start(), and CudaTimer::stop().

Here is the call graph for this function:

The documentation for this class was generated from the following files:

/home/runner/work/imate/imate/imate/_cu_trace_estimator/cu_trace_estimator.h
/home/runner/work/imate/imate/imate/_cu_trace_estimator/cu_trace_estimator.cu

Static Public Member Functions

Static Private Member Functions

Detailed Description

template<typename DataType> class cuTraceEstimator< DataType >

Member Function Documentation

◆ _cu_stochastic_lanczos_quadrature()

◆ cu_trace_estimator()

template<typename DataType>
class cuTraceEstimator< DataType >