Logging into Other Nodes

Generally you cannot directly login via ssh to the other nodes in the HPC cluster. However you can login to any node by submitting an interactive PBS job or you can ssh to any node on which you have a job running. The use case is that you might need to login to the node running your job to debug a problem with your PBS job or you might need to login to a GPU node to compile your code if it uses CUDA. See also Accessing GPU Nodes. Interactive jobs are not for running long computations. Hence the maximum walltime is 8 hours.

Submitting an Interactive PBS Job

Submitting an interactive PBS job will allow you to login into any node for a specified period of time. Use the -I option (this stands for Interactive) to request an “Interactive” session. This will place you on an available node with an allocation of 1 CPU core and some default amount of RAM and walltime.

$ qsub -I
hpcnode03 $

You can also use the -l option (this stands for resource list) to specify the host or the resources that you require. As an example, this command will place you on hpcnode07 with an allocation of 1 CPU core and some default amount of RAM for 30 minutes walltime.

$ qsub -I -l select=1:host=hpcnode07 -l walltime=00:30:00
hpcnode07 $

This will place you on hpcnode05 (which has a lot of memory) with 4 cores, 120GB of RAM with a walltime of 30 minutes.

$ qsub -I -l select=1:host=hpcnode05:ncpus=4:mem=120G -l walltime=00:30:00
hpcnode05 $

For some nodes which have restricted queues you will also need to add the queue name to the command.

$ qsub -I -l select=1:host=c3node03 -l walltime=00:30:00 -q c3b
c3node03 $

Type exit to exit the interactive shell on the execution host. You will be dropped back to the login node.

You will see that I have used a "select=1" specification in the above examples. This is how you specify one chunk of consumable resources. You then follow that select statement with a list of each consumable resource separated by a colon. Any walltime needs to be specified with a separate -l option as it is a non-consumable resource. You can’t place consumable and non-consumable resources within the one -l option.

Interactive logins will provide you with resources from the “Interactive Queue”. This is limited in wall time. See https://hpc.research.uts.edu.au/status/
For testing please use a minimal test data set and minimal cores, memory and time.

Logging into a Node which is Running Your Job

In this case you don’t need to submit an interactive PBS job. Just ssh into the node. First check what node your job is running on:

$ qstat -n1
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
215426.hpcnode0 876543   smallq   primes_job  95025   1   1    5gb 00:10 R 00:00 hpcnode09/1

In this example it’s running on hpcnode09 so we can ssh directly into this node.

$ ssh hpcnode09
Last login: Wed Jul 17 16:19:13 2019 from hpcnode01
hpcnode09 $ 

Note: When your job finishes on that node your ssh connection to that node will be automatically disconnected.

If you ssh to a node, and you do not have a job running on that node, within a few seconds you will be disconected and get a “Connection to ….. closed.” message.

This is custom footer