1. Inquiry GPU and CUDA with g-learn#

First, make sure g-learn recognizes the CUDA libraries and GPU device. There are a number of functions available in glearn.device module to inquiry GPU device.

1.1. Locate CUDA Toolkit#

Use glearn.device.locate_cuda() function to find the location of CUDA home directory.

>>> import glearn

>>> # Get the location and version of CUDA Toolkit
>>> glearn.device.locate_cuda()
{
    'home': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2',
    'include': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2/include',
    'lib': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2/lib64',
    'nvcc': '/global/software/sl-7.x86_64/modules/langs/cuda/11.2/bin/nvcc',
    'version':
    {
        'major': 11,
        'minor': 2,
        'patch': 0
    }
}

If the above function does not return an output such as in the above, it is because either CUDA Toolkit is not installed, or the directory of the CUDA Toolkit is not set. To do so, set the directory of CUDA Toolkit to either of the variables CUDA_HOME, CUDA_ROOT, or CUDA_PATH, such as by

echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda${CUDA_HOME:+:${CUDA_HOME}}' >> ~/.bashrc
source ~/.bashrc

1.2. Detect NVIDIA Graphic Driver#

Use glearn.device.get_nvidia_driver_version() function to make sure g-learn can detect the NVIDIA driver.

>>> # Get the version of NVIDIA graphic driver
>>> glearn.device.get_nvidia_driver_version()
460.84

1.3. Detect GPU Devices#

Use glearn.device.get_processor_name() and glearn.device.get_gpu_name() to find the name of CPU and GPU devices, respectively.

>>> # Get the name of CPU processor
>>> glearn.device.get_processor_name()
'Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz'

>>> # Get the name of GPU devices
>>> glearn.device.get_gpu_name()
'GeForce GTX 1080 Ti'

Note

If the name of the GPU device is empty, this is because either there is no GPU device detected, or NVIDIA graphic driver is not installed, or its location is not on the PATH. To do so, set the location of nvidia-smi executable to the PATH environment variable. On UNIX, this executable should be on /usr/bin directory and by default it should be already on the PATH.

The number of CPU threads and GPU devices can be obtained respectively by glearn.device.get_num_cpu_threads() and glearn.device.get_num_gpu_devices() functions.

>>> # Get number of processor threads
>>> glearn.device.get_num_cpu_threads()
8

>>> # Get number of GPU devices
>>> glearn.device.get_num_gpu_devices()
4

The glearn.info() function also obtains general information about g-learn configuration and devices.

 >>> glearn.info()
 glearn version  : 0.17.0
 imate version   : 0.18.0
 processor       : Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
 num threads     : 8
 gpu device      : 'GeForce GTX 1080 Ti'
 num gpu devices : 4
 cuda version    : 11.2.0
 nvidia driver   : 460.84
 process memory  : 1.7 (Gb)

Alternatively, one may directly use nvidia-smi command to inquiry the GPU devices.

nvidia-smi

Output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84       Driver Version: 460.84       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 33%   57C    P2    62W / 250W |    147MiB / 11178MiB |     25%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 27%   48C    P2    61W / 250W |    147MiB / 11178MiB |     23%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:81:00.0 Off |                  N/A |
| 18%   32C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 18%   32C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       654      C   python                            145MiB |
|    1   N/A  N/A       839      C   python                            145MiB |
+-----------------------------------------------------------------------------+

The output of nvidia-smi in the above shows there are four GPU devices available on the machine. For more complete information on the GPU devices, use

nvidia-smi -q