imate
C++/CUDA Reference
Loading...
Searching...
No Matches
RandomArrayGenerator< DataType > Class Template Reference

A static class to generate random set of vectors. This class acts as a templated namespace, where all member methods are public and static. More...

#include <random_array_generator.h>

Static Public Member Functions

static void generate_random_array (RandomNumberGenerator &random_number_generator, DataType *array, const LongIndexType array_size, const IndexType num_threads)
 Generates a pseudo-random array with Rademacher distribution where elements are either +1 or -1.
 

Detailed Description

template<typename DataType>
class RandomArrayGenerator< DataType >

A static class to generate random set of vectors. This class acts as a templated namespace, where all member methods are public and static.

See also
Orthogonalization

Definition at line 37 of file random_array_generator.h.

Member Function Documentation

◆ generate_random_array()

template<typename DataType >
void RandomArrayGenerator< DataType >::generate_random_array ( RandomNumberGenerator random_number_generator,
DataType *  array,
const LongIndexType  array_size,
const IndexType  num_threads 
)
static

Generates a pseudo-random array with Rademacher distribution where elements are either +1 or -1.

The Rademacher distribution is obtained from the Bernoulli distribution consisting of 0 and 1. To generate such distribution, a sequence of array_size/64 intergers, each with 64-bit, is generated using Xoshiro256** algorithm. The 64 bits of each integer are used to fill 64 elements of the array as follows. If the bit is 0, the array element is set to -1, and if the bit is 1, the array element is set to +1.

Thus, in this function, we use Xoshiro256** algorithm to generate 64 bits and use bits, not the integer itself. This approach is about ten times faster than convertng the random integer to double between [0,1] and then map them to +1 and -1.

Also, this function is more than a hundered times faster than using rand() function.

Parameters
[in]random_number_generatorThe random number generator object. This object should be initialized with num_threads by its constructor. On each parallel thread, an independent sequence of random numbers are generated.
[out]array1D array of the size array_size.
[out]array_sizeThe size of the array.
[in]num_threadsNumber of OpenMP parallel threads. If num_threads is zero then no paralel thread is created inside this function, rather it is assumed that this functon is called inside a parallel region from the caller.

Definition at line 66 of file random_array_generator.cpp.

71{
72 // Set the number of threads only if num_threads is non-zero. If
73 // num_threads is zero, it indicates to not create new threads in this
74 // function, rather, this function was called from another caller function
75 // that was already parallelized and this function should be executed in
76 // one of those threads.
77 if (num_threads > 0)
78 {
79 // num_threads parallel threads will be created in this function.
80 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
81 omp_set_num_threads(num_threads);
82 #endif
83 }
84
85 // Finding the thread id. It depends where we call omp_get_thread_num. If
86 // we call it here (below), it gives the thread id of the parent function.
87 // But if we call it inside a #pragma omp parallel, it gives the newer
88 // thread id that is "created" by the pragma.
89 int thread_id = 0;
90 if (num_threads == 0)
91 {
92 // If num_threads is zero (which means we will not create a new thread
93 // in this function), we get the thread id of the parent function.
94 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
95 thread_id = omp_get_thread_num();
96 #else
97 thread_id = 0;
98 #endif
99 }
100
101 // Number of bits to generate in each call of the random generator. This is
102 // the number of bits in a uint64_t integer.
103 const int bits_per_byte = 8;
104 const int num_bits = sizeof(uint64_t) * bits_per_byte;
105
106 // Number of chunks of divided array each in num_bits length
107 const LongIndexType num_chunks = \
108 static_cast<LongIndexType>(array_size/num_bits);
109
110 // Shared-memory parallelism over individual row vectors. The parallel
111 // section only works if num_threads is non-zero. Otherwise it runs in
112 // serial order.
113 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
114 #pragma omp parallel if (num_threads > 0)
115 #endif
116 {
117 // If num_threads is zero, the following thread_id is the thread id
118 // that the parent (caller) function created outside of this function.
119 // But, if num_thread is non-zero, the following thread id is the
120 // thread id that is created inside this parallel loop.
121 if (num_threads > 0)
122 {
123 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
124 thread_id = omp_get_thread_num();
125 #else
126 thread_id = 0;
127 #endif
128 }
129
130 #if defined(USE_OPENMP) && (USE_OPENMP == 1)
131 #pragma omp for schedule(static)
132 #endif
133 for (LongIndexType i=0; i < num_chunks; ++i)
134 {
135 // Generate 64 bits (one integer)
136 uint64_t bits = random_number_generator.next(thread_id);
137
138 // Fill 64 elements of array with +1 or -1 depending on the bits
139 for (IndexType j=0; j < num_bits; ++j)
140 {
141 // Check if the j-th bit (from right to left) is 1 or 0
142 if (bits & ( uint64_t(1) << j))
143 {
144 // Bit is 1. Write +1.0 in array
145 array[i*num_bits + j] = 1.0;
146 }
147 else
148 {
149 // Bit is 0. Write -1.0 in array
150 array[i*num_bits + j] = -1.0;
151 }
152 }
153 }
154 }
155
156 // The previous for loop (the above) does not fill all elements of array
157 // since it only iterates on the multiples of 64 (num_bits). We fill the
158 // rest of the array. There are less than 64 elements remained to fill. So,
159 // it suffice to generate only 64 bits (one integer) for the rest of the
160 // elements of the array
161 uint64_t bits = random_number_generator.next(thread_id);
162 IndexType j = 0;
163
164 // This loop should have less than 64 iterations.
165 for (LongIndexType i = num_chunks * num_bits; i < array_size; ++i)
166 {
167 // Check if the j-th bit (from right to left) is 1 or 0
168 if (bits & ( uint64_t(1) << j))
169 {
170 // Bit is 1. Write +1.0 in array
171 array[i] = 1.0;
172 }
173 else
174 {
175 // Bit is 0. Write -1.0 in array
176 array[i] = -1.0;
177 }
178
179 // Increment bit counter
180 ++j;
181 }
182}
uint64_t next(const int thread_id)
Generates the next random number in the sequence, depending on the thread id.
void omp_set_num_threads(int num_threads)
int omp_get_thread_num()
int LongIndexType
Definition types.h:60
int IndexType
Definition types.h:65

References RandomNumberGenerator::next(), omp_get_thread_num(), and omp_set_num_threads().

Referenced by cTraceEstimator< DataType >::_c_stochastic_lanczos_quadrature(), cuTraceEstimator< DataType >::_cu_stochastic_lanczos_quadrature(), cuOrthogonalization< DataType >::orthogonalize_vectors(), and cOrthogonalization< DataType >::orthogonalize_vectors().

Here is the call graph for this function:
Here is the caller graph for this function:

The documentation for this class was generated from the following files: