3. Using GPU Devices#
pyrand can run on CUDA-capable GPU devices with the following installed:
NVIDIA graphic driver,
CUDA libraries.
CUDA Version
The version of CUDA libraries installed on the user’s machine should match the version of the CUDA libraries that pyrand package was compiled with. This includes matching both major and minor parts of the version numbers. However, the version’s patch numbers do not need to be matched.
Note
The pyrand package that is installed with either pip
or conda
already has built-in support for CUDA Toolkit. The latest version of pyrand is compatible with CUDA 11.7.x, which should match the CUDA version installed on the user’s machine.
The above methods are described in order below.
3.1. Install NVIDIA CUDA Toolkit#
The following instruction describes installing CUDA 11.7 for Ubuntu 22.04, CentOS 7, and Red Hat 9 (RHEL 9) on the X86_64 platform. You may refer to CUDA installation guide from NVIDIA developer documentation for other operating systems and platforms.
Attention
NVIDIA does not support macOS. You can install the NVIDIA CUDA Toolkit on Linux and Windows only.
3.1.1. Install NVIDIA Graphic Driver#
Register NVIDIA CUDA repository by
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
sudo yum clean all
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf clean all
Install NVIDIA graphic driver with
export DEBIAN_FRONTEND=noninteractive
sudo -E apt install cuda-drivers -y
sudo yum -y install nvidia-driver-latest-dkms
sudo dnf -y module install nvidia-driver:latest-dkms
The above step might need a reboot afterwards to properly load NVIDIA graphic driver. Confirm the driver installation by
nvidia-smi
3.1.2. Install CUDA Toolkit#
It is not required to install the entire CUDA Toolkit (2.6GB). Rather, only the CUDA runtime library, cuBLAS, and cuSparse libraries are sufficient (700MB in total). These can be installed by
sudo apt install cuda-cudart-11-7 libcublas-11-7 libcusparse-11-7 -y
sudo yum install --setopt=obsoletes=0 -y \
cuda-nvcc-11-7.x86_64 \
libcublas-11-7.x86_64 \
libcusparse-11-7.x86_64
sudo dnf install --setopt=obsoletes=0 -y \
cuda-nvcc-11-7.x86_64 \
libcublas-11-7.x86_64 \
libcusparse-11-7.x86_64
Update PATH
with the CUDA installation location by
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda${CUDA_HOME:+:${CUDA_HOME}}' >> ~/.bashrc
source ~/.bashrc
3.1.3. Install OpenMP#
In addition to CUDA Toolkit, make sure the OpenMP library is also installed using
sudo apt install libgomp1 -y
sudo yum install libgomp -y
sudo dnf install libgomp -y
3.2. Compile pyrand from Source with CUDA#
3.2.1. Install C++ Compiler and OpenMP#
Compile pyrand with either of GCC, Clang/LLVM, or Intel C++ compiler.
Install GNU GCC Compiler
sudo apt install build-essential libomp-dev
sudo yum group install "Development Tools"
sudo dnf group install "Development Tools"
Then, export C
and CXX
variables by
export CC=/usr/local/bin/gcc
export CXX=/usr/local/bin/g++
Install Clang/LLVN Compiler
sudo apt install clang libomp-dev
sudo yum install yum-utils
sudo yum-config-manager --enable extras
sudo yum makecache
sudo yum install clang
sudo dnf install yum-utils
sudo dnf config-manager --enable extras
sudo dnf makecache
sudo dnf install clang
Then, export C
and CXX
variables by
export CC=/usr/local/bin/clang
export CXX=/usr/local/bin/clang++
Install Intel oneAPI Compiler
To install Intel Compiler see Intel oneAPI Base Toolkit.
3.2.2. Install CUDA Compiler and Development Libraries#
Attention
The minimum version of CUDA to compile pyrand is CUDA 10.0.
If CUDA Toolkit is installed, skip this part. Otherwise, Make sure the CUDA compiler and the development libraries of cuBLAS and cuSparse are installed by
sudo apt install -y \
cuda-nvcc-11-7 \
libcublas-11-7 \
libcublas-dev-11-7 \
libcusparse-11-7 -y \
libcusparse-dev-11-7
sudo yum install --setopt=obsoletes=0 -y \
cuda-nvcc-11-7.x86_64 \
cuda-cudart-devel-11-7.x86_64 \
libcublas-11-7.x86_64 \
libcublas-devel-11-7.x86_64 \
libcusparse-11-7.x86_64 \
libcusparse-devel-11-7.x86_64
sudo dnf install --setopt=obsoletes=0 -y \
cuda-nvcc-11-7.x86_64 \
cuda-cudart-devel-11-7.x86_64 \
libcublas-11-7.x86_64 \
libcublas-devel-11-7.x86_64 \
libcusparse-11-7.x86_64 \
libcusparse-devel-11-7.x86_64
Update PATH
with the CUDA installation location by
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
source ~/.bashrc
Check if the CUDA compiler is available with which nvcc
.
3.2.3. Load CUDA Compiler on GPU Cluster#
If you are compiling pyrand on a GPU cluster, chances are the CUDA Toolkit is already installed. If the cluster uses module
interface, load CUDA as follows.
First, check if a CUDA module is available by
module avail
Load both CUDA and GCC by
module load cuda gcc
You may specify CUDA version if multiple CUDA versions are available, such as by
module load cuda/11.7 gcc/6.3
You may check if CUDA Compiler is available with which nvcc
.
3.2.4. Configure Compile-Time Environment Variables#
Specify the home directory of CUDA Toolkit by setting either of the variables CUDA_HOME
, CUDA_ROOT
, or CUDA_PATH
. The home directory should be a path containing the executable /bin/nvcc
. For instance, if /usr/local/cuda/bin/nvcc
exists, export the following:
export CUDA_HOME=/usr/local/cuda
To permanently set this variable, place the above line in a profile file, such as in ~/.bashrc
, or ~/.profile
, and source this file, for instance by
echo 'export CUDA_HOME=/usr/local/cuda${CUDA_HOME:+:${CUDA_HOME}}' >> ~/.bashrc
source ~/.bashrc
To compile pyrand with CUDA, export the following flag variable
export USE_CUDA=1
3.2.5. Enable Dynamic Loading (optional)#
When pyrand is complied, the CUDA libraries bundle with the final installation of pyrand package, making it over 700MB. While this is generally not an issue for most users, often a small package is preferable if the installed package has to be distributed to other machines. To this end, enable the dynamic loading feature of pyrand. In this case, the CUDA libraries do not bundle with the pyrand installation, rather, pyrand loads the existing CUDA libraries of the host machine at runtime. To enable dynamic loading, simply set:
export CUDA_DYNAMIC_LOADING=1
3.2.6. Compile and Install#
Get the source code of pyrand with
git clone https://github.com/ameli/pyrand.git
Compile and install by
cd pyrand
python setup.py install
3.3. Use pyrand Docker Container on GPU#
This method neither requires installing CUDA nor pyrand as all are pre-installed in a docker image.
3.3.1. Install Docker#
First, install docker. Briefly:
sudo apt update
sudo apt install ca-certificates curl gnupg lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo systemctl enable docker.service
sudo systemctl enable containerd.service
sudo systemctl start docker
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo systemctl enable docker.service
sudo systemctl enable containerd.service
sudo systemctl start docker
Configure docker to run docker without sudo password by
sudo groupadd docker
sudo usermod -aG docker $USER
Then, log out and log back. If docker is installed on a virtual machine, restart the virtual machine for changes to take effect.
3.3.2. Install NVIDIA Container Toolkit#
To access host’s GPU device from a docker container, install NVIDIA Container Toolkit as follows.
Add the package to the repository:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
Install nvidia-contaner-toolkit by:
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo yum install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm
sudo dnf install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm
Restart docker:
sudo systemctl restart docker
3.3.3. Get pyrand Docker image#
Get the pyrand docker image by
docker pull sameli/pyrand
The docker image has the followings pre-installed:
CUDA: in
/usr/local/cuda
Python 3.9: in
/usr/bin/python3
Python interpreters: ipython, jupyter
Editor: vim
3.3.4. Use pyrand Docker Container on GPU#
To use host’s GPU from the docker container, add --gpus all
to any of the docker run
commands, such as by
docker run --gpus all -it sameli/pyrand
The followings are some examples of using docker run
with various options:
To check the host’s NVIDIA driver version, CUDA runtime library version, and list of available GPU devices, run
nvida-smi
command by:docker run --gpus all sameli/pyrand nvidia-smi
To run the container and open Python interpreter directly at startup:
docker run -it --gpus all sameli/pyrand
This also imports pyrand package automatically.
To run the container and open IPython interpreter directly at startup:
docker run -it --gpus all sameli/pyrand ipython
This also imports pyrand package automatically.
To open Bash shell only:
docker run -it --gpus all --entrypoint /bin/bash sameli/pyrand
To mount a host’s directory, such as
/home/user/project
, onto a directory of the docker’s container, such as/root
, use:docker run -it --gpus all -v /home/user/project:/root sameli/pyrand
3.4. Inquiry GPU and CUDA with pyrand#
First, make sure pyrand recognizes the CUDA libraries and GPU device. There are a number of functions available in pyrand.device module to inquiry GPU device.
3.4.1. Locate CUDA Toolkit#
Use pyrand.device.locate_cuda()
function to find the location of CUDA home directory.
>>> import pyrand
>>> # Get the location and version of CUDA Toolkit
>>> pyrand.device.locate_cuda()
{
'home': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2',
'include': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2/include',
'lib': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2/lib64',
'nvcc': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2/bin/nvcc',
'version':
{
'major': 11,
'minor': 2,
'patch': 0
}
}
If the above function does not return an output such as in the above, it is because either CUDA Toolkit is not installed, or the directory of the CUDA Toolkit is not set. To do so, set the directory of CUDA Toolkit to either of the variables CUDA_HOME
, CUDA_ROOT
, or CUDA_PATH
, such as by
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda${CUDA_HOME:+:${CUDA_HOME}}' >> ~/.bashrc
source ~/.bashrc
3.4.2. Detect NVIDIA Graphic Driver#
Use pyrand.device.get_nvidia_driver_version()
function to make sure pyrand can detect the NVIDIA driver.
>>> # Get the version of NVIDIA graphic driver
>>> pyrand.device.get_nvidia_driver_version()
460.84
3.4.3. Detect GPU Devices#
Use pyrand.device.get_processor_name()
and pyrand.device.get_gpu_name()
to find the name of CPU and GPU devices, respectively.
>>> # Get the name of CPU processor
>>> pyrand.device.get_processor_name()
'Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz'
>>> # Get the name of GPU devices
>>> pyrand.device.get_gpu_name()
'GeForce GTX 1080 Ti'
Note
If the name of the GPU device is empty, this is because either there is no GPU device detected, or NVIDIA graphic driver is not installed, or its location is not on the PATH. To do so, set the location of nvidia-smi
executable to the PATH
environment variable. On UNIX, this executable should be on /usr/bin
directory and by default it should be already on the PATH.
The number of CPU threads and GPU devices can be obtained respectively by pyrand.device.get_num_cpu_threads()
and pyrand.device.get_num_gpu_devices()
functions.
>>> # Get number of processor threads
>>> pyrand.device.get_num_cpu_threads()
8
>>> # Get number of GPU devices
>>> pyrand.device.get_num_gpu_devices()
4
The pyrand.info()
function also obtains general information about pyrand configuration and devices.
>>> pyrand.info()
pyrand version : 0.13.0
processor : Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
num threads : 8
gpu device : 'GeForce GTX 1080 Ti'
num gpu devices : 4
cuda version : 11.2.0
nvidia driver : 460.84
process memory : 1.7 (Gb)
Alternatively, one may directly use nvidia-smi
command to inquiry the GPU devices.
nvidia-smi
Output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 33% 57C P2 62W / 250W | 147MiB / 11178MiB | 25% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 27% 48C P2 61W / 250W | 147MiB / 11178MiB | 23% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:81:00.0 Off | N/A |
| 18% 32C P0 59W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:82:00.0 Off | N/A |
| 18% 32C P0 59W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 654 C python 145MiB |
| 1 N/A N/A 839 C python 145MiB |
+-----------------------------------------------------------------------------+
The output of nvidia-smi
in the above shows there are four GPU devices available on the machine. For more complete information on the GPU devices, use
nvidia-smi -q
3.5. Run pyrand Functions on GPU#
All functions in pyrand that accept the SLQ method (using method=slq
argument) can perform computations on GPU devices. To do so, include gpu=True
argument to the function syntax. The following examples show using multi-GPU devices to compute the log-determinant of a large matrix.
3.5.1. A Simple Example#
First, create a sample Toeplitz matrix with ten million in size using pyrand.toeplitz()
function.
>>> # Import toeplitz matrix
>>> from pyrand import toeplitz
>>> # Generate a sample matrix (a toeplitz matrix)
>>> n = 10000000
>>> A = toeplitz(2, 1, size=n, gram=True)
Next, create an pyrand.Matrix
object from matrix A:
>>> # Import Matrix class
>>> from pyrand import Matrix
>>> # Create a matrix operator object from matrix A
>>> Aop = Matrix(A)
Compute the log-determinant of the above matrix on GPU by passing gpu=True
to pyrand.logdet()
function. Recall GPU can only be employed using SLQ method by passing method=slq
argument.
>>> # Import logdet function
>>> from pyrand import logdet
>>> # Compute log-determinant of Aop
>>> logdet(Aop, method='slq', gpu=True)
13862193.020813728
3.5.2. Get Process Information#
It is useful pass the argument return_info=True
to get information about the computation process.
>>> # Compute log-determinant of Aop
>>> ld, info = logdet(Aop, method='slq', gpu=True, return_info=True)
The information about GPU devices used during the computation can be found in info['device']
key:
>>> from pprint import pprint
>>> pprint(info['device'])
{
'num_cpu_threads': 8,
'num_gpu_devices': 4,
'num_gpu_multiprocessors': 28,
'num_gpu_threads_per_multiprocessor': 2048
}
The processing time can be obtained by info['time']
key:
>>> pprint(info['time'])
{
'alg_wall_time': 1.7192635536193848,
'cpu_proc_time': 3.275628339,
'tot_wall_time': 3.5191736351698637
}
3.5.3. Verbose Output#
Alternatively, to print verbose information, including the information about GPU devices, pass verbose=True
to the function argument:
>>> # Compute log-determinant of Aop
>>> logdet(Aop, method='slq', gpu=True, verbose=True)
The above script prints the following table. The last section of the table shows device information.
3.5.4. Set Number of GPU Devices#
By default, pyrand employs the maximum number of available GPU devices. To employ a specific number of GPU devices, set num_gpu-devices
in the function arguments. For instance
>>> # Import logdet function
>>> from pyrand import logdet
>>> # Compute log-determinant of Aop
>>> ld, info = logdet(Aop, method='slq', gpu=True, return_info=True,
... num_gpu_devices=2)
>>> # Check how many GPU devices used
>>> pprint(info['device'])
{
'num_cpu_threads': 8,
'num_gpu_devices': 2,
'num_gpu_multiprocessors': 28,
'num_gpu_threads_per_multiprocessor': 2048
}
3.6. Deploy pyrand on GPU Clusters#
On GPU clusters, the NVIDIA graphic driver and CUDA libraries are pre-installed and they only need to be loaded.
3.6.1. Load Modules#
Check which modules are available on the machine
module avail
Load python and a compatible CUDA version by
module load python/3.9
module load cuda/11.7
Check which modules are loaded
module list
3.6.2. Interactive Session with SLURM#
There are two ways to work with GPU on a cluster. The first method is to ssh
to a GPU node and for hands-on interaction with the GPU device. If the GPU cluster uses SLURM manager, use srun
to initiate a session as follows
srun -A fc_biome -p savio2_gpu --gres=gpu:1 --ntasks 2 -t 2:00:00 --pty bash -i
In the above example:
-A fc_biome
sets the group account associated with the user.-p savio2_gpu
sets the name of the GPU node.--gres=gpu:1
requests one GPU device on the node.--ntasks 2
requests two parallel CPU threads on the node.-t 2:00:00
requests a two-hour session.--pty bash
starts a Bash shell.-i
redirects std input to the user’s terminal for interactive use.
See the list of options of srun for details. As another example, to request a GPU node named savio2_1080ti
with 4 GPU devices and 8 CPU threads for 10 hours, run
srun -A fc_biome -p savio2_1080ti --gres=gpu:4 --ntasks 8 -t 10:00:00 --pty bash -i
Note
Replace the name of nodes and accounts in the above example with yours. The name of GPU nodes and accounts in the above examples are obtained from SAVIO Cluster (an institutional Cluster at UC Berkeley).
3.6.3. Submit Jobs to GPU with SLURM#
To submit a parallel job to GPU nodes on a cluster with SLURM manager, use sbatch
command, such as
sbatch jobfile.sh
See the list of options of sbatch for details. A sample job file, jobfile.sh
is shown below. The highlighted line in the file instructs SLURM to request the number of GPU devices with --gres
option.
#!/bin/bash
#SBATCH --job-name=your_project
#SBATCH --mail-type=your_email
#SBATCH --mail-user=your_email
#SBATCH --partition=savio2_1080ti
#SBATCH --account=fc_biome
#SBATCH --qos=savio_normal
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --gres=gpu:4
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64gb
#SBATCH --output=output.log
# Point to where Python is installed
PYTHON_DIR=$HOME/programs/miniconda3
# Point to where a script should run
SCRIPTS_DIR=$(dirname $PWD)/scripts
# Directory of log files
LOG_DIR=$PWD
# Load modules
module load cuda/11.2
# Export OpenMP variables
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Run the script
$PYTHON_DIR/bin/python ${SCRIPTS_DIR}/script.py > ${LOG_DIR}/output.txt
In the above job file, modify --partition
, --account
, and --qos
according to your user account allowance on the cluster.