imate
C++/CUDA Reference
RandomArrayGenerator< DataType > Class Template Reference

A static class to generate random set of vectors. This class acts as a templated namespace, where all member methods are public and static. More...

#include <random_array_generator.h>

Static Public Member Functions

static void generate_random_array (RandomNumberGenerator &random_number_generator, DataType *array, const LongIndexType array_size, const IndexType num_threads)
 Generates a pseudo-random array with Rademacher distribution where elements are either +1 or -1. More...
 

Detailed Description

template<typename DataType>
class RandomArrayGenerator< DataType >

A static class to generate random set of vectors. This class acts as a templated namespace, where all member methods are public and static.

See also
Orthogonalization

Definition at line 37 of file random_array_generator.h.

Member Function Documentation

◆ generate_random_array()

template<typename DataType >
void RandomArrayGenerator< DataType >::generate_random_array ( RandomNumberGenerator random_number_generator,
DataType *  array,
const LongIndexType  array_size,
const IndexType  num_threads 
)
static

Generates a pseudo-random array with Rademacher distribution where elements are either +1 or -1.

The Rademacher distribution is obtained from the Bernoulli distribution consisting of 0 and 1. To generate such distribution, a sequence of array_size/64 intergers, each with 64-bit, is generated using Xoshiro256** algorithm. The 64 bits of each integer are used to fill 64 elements of the array as follows. If the bit is 0, the array element is set to -1, and if the bit is 1, the array element is set to +1.

Thus, in this function, we use Xoshiro256** algorithm to generate 64 bits and use bits, not the integer itself. This approach is about ten times faster than convertng the random integer to double between [0,1] and then map them to +1 and -1.

Also, this function is more than a hundered times faster than using rand() function.

Parameters
[in]random_number_generatorThe random number generator object. This object should be initialized with num_threads by its constructor. On each parallel thread, an independent sequence of random numbers are generated.
[out]array1D array of the size array_size.
[out]array_sizeThe size of the array.
[in]num_threadsNumber of OpenMP parallel threads. If num_threads is zero then no paralel thread is created inside this function, rather it is assumed that this functon is called inside a parallel region from the caller.

Definition at line 63 of file random_array_generator.cpp.

68 {
69  // Set the number of threads only if num_threads is non-zero. If
70  // num_threads is zero, it indicates to not create new threads in this
71  // function, rather, this function was called from another caller function
72  // that was already parallelized and this function should be executed in
73  // one of those threads.
74  if (num_threads > 0)
75  {
76  // num_threads parallel threads will be created in this function.
77  omp_set_num_threads(num_threads);
78  }
79 
80  // Finding the thread id. It depends where we call omp_get_thread_num. If
81  // we call it here (below), it gives the thread id of the parent function.
82  // But if we call it inside a #pragma omp parallel, it gives the newer
83  // thread id that is "created" by the pragma.
84  int thread_id = 0;
85  if (num_threads == 0)
86  {
87  // If num_threads is zero (which means we will not create a new thread
88  // in this function), we get the thread id of the parent function.
89  thread_id = omp_get_thread_num();
90  }
91 
92  // Number of bits to generate in each call of the random generator. This is
93  // the number of bits in a uint64_t integer.
94  const int bits_per_byte = 8;
95  const int num_bits = sizeof(uint64_t) * bits_per_byte;
96 
97  // Number of chunks of divided array each in num_bits length
98  const LongIndexType num_chunks = \
99  static_cast<LongIndexType>(array_size/num_bits);
100 
101  // Shared-memory parallelism over individual row vectors. The parallel
102  // section only works if num_threads is non-zero. Otherwise it runs in
103  // serial order.
104  #pragma omp parallel if (num_threads > 0)
105  {
106  // If num_threads is zero, the following thread_id is the thread id
107  // that the parent (caller) function created outside of this function.
108  // But, if num_thread is non-zero, the following thread id is the
109  // thread id that is created inside this parallel loop.
110  if (num_threads > 0)
111  {
112  thread_id = omp_get_thread_num();
113  }
114 
115  #pragma omp for schedule(static)
116  for (LongIndexType i=0; i < num_chunks; ++i)
117  {
118  // Generate 64 bits (one integer)
119  uint64_t bits = random_number_generator.next(thread_id);
120 
121  // Fill 64 elements of array with +1 or -1 depending on the bits
122  for (IndexType j=0; j < num_bits; ++j)
123  {
124  // Check if the j-th bit (from right to left) is 1 or 0
125  if (bits & ( uint64_t(1) << j))
126  {
127  // Bit is 1. Write +1.0 in array
128  array[i*num_bits + j] = 1.0;
129  }
130  else
131  {
132  // Bit is 0. Write -1.0 in array
133  array[i*num_bits + j] = -1.0;
134  }
135  }
136  }
137  }
138 
139  // The previous for loop (the above) does not fill all elements of array
140  // since it only iterates on the multiples of 64 (num_bits). We fill the
141  // rest of the array. There are less than 64 elements remained to fill. So,
142  // it suffice to generate only 64 bits (one integer) for the rest of the
143  // elements of the array
144  uint64_t bits = random_number_generator.next(thread_id);
145  IndexType j = 0;
146 
147  // This loop should have less than 64 iterations.
148  for (LongIndexType i = num_chunks * num_bits; i < array_size; ++i)
149  {
150  // Check if the j-th bit (from right to left) is 1 or 0
151  if (bits & ( uint64_t(1) << j))
152  {
153  // Bit is 1. Write +1.0 in array
154  array[i] = 1.0;
155  }
156  else
157  {
158  // Bit is 0. Write -1.0 in array
159  array[i] = -1.0;
160  }
161 
162  // Increment bit counter
163  ++j;
164  }
165 }
uint64_t next(const int thread_id)
Generates the next random number in the sequence, depending on the thread id.
int LongIndexType
Definition: types.h:60
int IndexType
Definition: types.h:65

References RandomNumberGenerator::next().

Referenced by cTraceEstimator< DataType >::_c_stochastic_lanczos_quadrature(), cuTraceEstimator< DataType >::_cu_stochastic_lanczos_quadrature(), cuOrthogonalization< DataType >::orthogonalize_vectors(), and cOrthogonalization< DataType >::orthogonalize_vectors().

Here is the call graph for this function:
Here is the caller graph for this function:

The documentation for this class was generated from the following files: