Checking GPU Usage

You can obtain a basic information on the NVIDA GPU and its current usage using NVIDIA’s “System Management Interface” program nvidia-smi. Look at its man page for details man nvidia-smi, or run it with the -h option i.e. nvidia-smi -h for help. See Checking Device Capability to find out detailed information on the card.

Please Note: Do not use the nvidia-smi command with the -l option, nor with the watch command. Continuously running nvidia-smi using a loop consumes GPU resources and will slow down everyone elses jobs.

You need to be logged into a GPU node to run this command.

Here is an example of its output.

GPUNode $ 
$ nvidia-smi 

Mon Jun 28 14:13:56 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   61C    P0   155W / 250W |  17289MiB / 32510MiB |     91%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:D8:00.0 Off |                    0 |
| N/A   62C    P0   152W / 250W |  17289MiB / 32510MiB |     89%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     57573      C   python                          17283MiB |
|    1   N/A  N/A     57573      C   python                          17283MiB |
+-----------------------------------------------------------------------------+
$ 

You can see that this node has two Tesla V100 GPUs installed. Both are running at about 50% memory usage and 90% GPU utilisation.

You can get a list of the two GPUs and their UUIDs with this:

hpcnode10 $ nvidia-smi --list-gpus
GPU 0: Tesla V100-PCIE-32GB (UUID: GPU-37f061b1-7948-e188-56a7-d30f5e0ffc70)
GPU 1: Tesla V100-PCIE-32GB (UUID: GPU-151b0546-4c5b-039a-e1e2-0acaa0098909)

You can specify what information you woud like to see by using the --query option with --display parameters e.g.:

$ nvidia-smi -q -d MEMORY,COMPUTE,UTILIZATION

The above will show the data for both GPUs. If you only wish to see the information for a specific GPU then you can specify the UUID to query:

$ nvidia-smi -q -d MEMORY,COMPUTE,UTILIZATION -i GPU-37f061b1-7948-e188-56a7-d30f5e0ffc70 --loop=5

Notice in the above example I have also used the --loop option. This can be very useful but please not use this continuously with small time intervals.

For all the details on this command see the manual pages man nvidia-smi.

This is custom footer