Logging into Other Nodes¶
Generally you cannot directly login via ssh to the other nodes in the HPC cluster. However you can login to any node by submitting an interactive PBS job or you can ssh to any node on which you have a job running. The use case is that you might need to login to the node running your job to debug a problem with your PBS job or you might need to login to a GPU node to compile your code if it uses CUDA. See also Accessing GPU Nodes. Interactive jobs are not for running long computations. Hence the maximum walltime is 8 hours.
Submitting an Interactive PBS Job¶
Submitting an interactive PBS job will allow you to login into any node
for a specified period of time.
Use the -I
option (this stands for Interactive) to request an “Interactive” session.
This will place you on an available node with an allocation of 1 CPU core and some
default amount of RAM and walltime.
$ qsub -I
hpcnode03 $
You can also use the -l
option (this stands for resource list) to specify
the host or the resources that you require.
As an example, the command below will place you on hpcnode07 with an allocation
of 1 CPU core and some default amount of RAM for 30 minutes walltime. You might
use this to just login to this node to monitor a big job running there.
$ qsub -I -l select=1:host=hpcnode07 -l walltime=00:30:00
hpcnode07 $
If you need more resources you can specify the number of cpus and memory. The command below will place you on a node and allocate you 4 cores, 120GB of RAM and a walltime of 30 minutes.
$ qsub -I -l select=1:ncpus=4:mem=120G -l walltime=00:30:00
hpcnode05 $
For some nodes which have restricted queues you will also need to add the queue name to the command.
$ qsub -I -l select=1:host=c3node03 -l walltime=00:30:00 -q c3b
c3node03 $
Type exit
to exit the interactive shell on the execution host. You will be
dropped back to the login node.
You will see that I have used a "select=1"
specification in the above examples.
This is how you specify one chunk of consumable resources. You then follow that
select statement with a list of each consumable resource separated by a colon.
Any walltime needs to be specified with a separate -l
option as it is a
non-consumable resource. You can’t place consumable and non-consumable resources
within the one -l
option.
Note
Interactive logins will provide you with resources from the “Interactive Queue”.
This is limited in wall time.
See https://hpc.research.uts.edu.au/status/
For testing please use a minimal test data set and minimal cores, memory and time.
Logging into a Node which is Running Your Job¶
In this case you don’t need to submit an interactive PBS job. Just ssh into the node. First check what node your job is running on:
$ qstat -n1
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
215426.hpcnode0 876543 smallq primes_job 95025 1 1 5gb 00:10 R 00:00 hpcnode09/1
In this example it’s running on hpcnode09
so we can ssh directly into this node.
$ ssh hpcnode09
Last login: Wed Jul 17 16:19:13 2019 from hpcnode01
hpcnode09 $
Note: When your job finishes on that node your ssh connection to that node will be automatically disconnected.
If you ssh to a node, and you do not have a job running on that node, within a few seconds you will be disconected and get a “Connection to ..... closed.” message.