On the cluster some nodes have GPUs installed. The HPC Status page will indicate what nodes are GPU capable. These GPU nodes each have two Nvidia V100 GPUs installed.
In addition to the number of CPU cores required i.e.
#PBS -l ncpus=1 just specify
how many GPUs your job will require with
#PBS -l ngpus=1. Usually just one GPU will be required.
When you submit this job the PBS scheduler will allocate this job to a node that
has a GPU.
Please don’t request a GPU unless you are actually runnning your code.
Normally to run a job on the cluster you login to the login node
qsub my_job_script.sh which contains PBS commands specifying the
CPU, memory and time resources that you need. PBS then allocates your job the most appropriate
node for your requirements. Generally you can’t login to nodes other than the login node.
(See though Logging into Other Nodes.)
But in the case of GPUs to compile your CUDA program you might need to login to a GPU node as you need to have access to the CUDA libraries. You do this by submitting an interactive PBS job on the login node that logs you into a GPU node for a specified period of time. From there you can compile and test.
This command will log you into a GPU node for 30 minutes.
$ qsub -I -q gpuq -l walltime=00:30:00 hpcnode10 ~$
This command will log you into a specific GPU node for 30 minutes.
$ qsub -I -q gpuq -l host=hpcnode10 -l walltime=00:30:00 hpcnode10 ~$
You will get 1 CPU core and 5GB of RAM as that is the default. The default walltime is 1 hour.
Notice that the above did not request any GPUs. You do not need to use them if you are just compiling your code. You just need to be logged into a GPU node for your compiler to detect the CUDA libraries.
If you actually need to use a GPU for testing you can add that request to your interactive session request like this:
$ qsub -I -q gpuq -l select=1:ngpus=1 -l walltime=00:30:00 hpcnode10 ~$
In the above example we didn’t pick a specific node, we just asked for one GPU core and the PBS scheduler selected a node for us. If you need more CPU cores and memory for testing then then specify what you require like this:
$ qsub -I -q gpuq -l select=1:ngpus=1:ncpus=2:mem=15gb -l walltime=00:30:00 hpcnode10 ~$
If you really do need a specific node then include a host resource:
$ qsub -I -q gpuq -l select=1:host=hpcnode12:ngpus=1:ncpus=2:mem=15gb -l walltime=00:30:00 hpcnode12 ~$
When you are finished compiling or testing just type
exit to logout of that shell.
Interactive logins will provide you with resources from the “Interactive Queue”.
This is limited in wall time. See https://hpc.research.uts.edu.au/status/
For testing please use a minimal test data set and minimal cores, memory and time.
You can also place any of the PBS commands for an interactive job as described above
into a script just like your normal job scripts. Create a script called
See the Example Interactive PBS Script at the end of this page. In this script
set a walltime sufficient for you to do your task of compiling or testing.
For instance if you set walltime=01:00:00 you will have 1 hour.
Now submit this script as a PBS interactive job (note the -I to invoke interactive mode).
$ qsub -I gpu_access.sh
You will now have a shell on a GPU node. To exit the GPU node, just type “exit”. You will drop back to the login node. You will be also dropped back to the login node once your wall time is reached.
Here is an example gpu_access.sh script.
Note that we have not requested any GPU cores i.e. no
#PBS -l ngpus=1
(it’s commented out in the example below) because we just require
an interactive node for compiling our program. Thus allowing the GPU to be used by others.
Also we don’t have to wait for access if the GPU is being used by others.
#!/bin/bash # This is a simple PBS script that uses the "interactive" mode of PBS. # See "Interactive-batch Jobs" in the "PBS User Guide". # # For interactive use you must submit this job with -I # qsub -I this_script.sh #PBS -l ncpus=1 ###PBS -l ngpus=1 #PBS -l mem=5GB #PBS -q gpuq # Set your interactive wall time limit, hh:mm:ss #PBS -l walltime=01:00:00 # Note: don't have any other commands below here!