Checking GPU Usage¶
You can obtain a basic information on the NVIDA GPU and its current usage
using NVIDIA’s “System Management Interface” program
nvidia-smi
. Look at its man page for details man nvidia-smi
,
or run it with the -h option i.e. nvidia-smi -h
for help.
See Checking Device Capability to find out detailed information on the card.
Note
Please: Do not use the nvidia-smi command with the -l
option, nor
with the watch
command. Continuously running nvidia-smi using a loop consumes
GPU resources and will slow down everyone elses jobs.
You need to be logged into a GPU node to run this command.
Here is an example of its output.
GPUNode $
$ nvidia-smi
Mon Jun 28 14:13:56 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 61C P0 155W / 250W | 17289MiB / 32510MiB | 91% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:D8:00.0 Off | 0 |
| N/A 62C P0 152W / 250W | 17289MiB / 32510MiB | 89% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 57573 C python 17283MiB |
| 1 N/A N/A 57573 C python 17283MiB |
+-----------------------------------------------------------------------------+
$
You can see that this node has two Tesla V100 GPUs installed. Both are running at about 50% memory usage and 90% GPU utilisation.
You can get a list of the two GPUs and their UUIDs with this:
hpcnode10 $ nvidia-smi --list-gpus
GPU 0: Tesla V100-PCIE-32GB (UUID: GPU-37f061b1-7948-e188-56a7-d30f5e0ffc70)
GPU 1: Tesla V100-PCIE-32GB (UUID: GPU-151b0546-4c5b-039a-e1e2-0acaa0098909)
You can specify what information you woud like to see by using the --query
option
with --display
parameters e.g.:
$ nvidia-smi -q -d MEMORY,COMPUTE,UTILIZATION
The above will show the data for both GPUs. If you only wish to see the information for a specific GPU then you can specify the UUID to query:
$ nvidia-smi -q -d MEMORY,COMPUTE,UTILIZATION -i GPU-37f061b1-7948-e188-56a7-d30f5e0ffc70 --loop=600
Notice in the above example I have also used the --loop
option. This can be
very useful but please not use this continuously with small time intervals.
For all the details on this command see the manual pages man nvidia-smi
.