Logging into Other Nodes

Generally you cannot directly login via ssh to the other nodes in the HPC cluster. However you can login to any node by submitting an interactive PBS job or you can ssh to any node on which you have a job running. The use case is that you might need to login to the node running your job to debug a problem with your PBS job or you might need to login to a GPU node to compile your code if it uses CUDA. See also Accessing GPU Nodes. Interactive jobs are not for running long computations. Hence the maximum walltime is 8 hours.

Submitting an Interactive PBS Job

Submitting an interactive PBS job will allow you to login into any node for a specified period of time. The example below gives you a login on host hpcnode07 for 30 minutes. Use the “-I” option and select the host and wall time. The “-I” option will start a PBS job in the interactive queue. You will get 1 CPU core and 5GB of RAM as that is the default. There is also a default walltime which is 1 hour.

$ qsub -I -l select=1:host=hpcnode07 -l walltime=00:30:00
qsub: waiting for job 123402.hpcnode0 to start
qsub: job 123402.hpcnode0 ready
hpcnode07 $

For some nodes which have restricted queues you will also need to add the queue name to the command.

$ qsub -I -l select=1:host=c3node03 -l walltime=00:30:00 -q c3b
c3node03 $

Type exit to exit the interactive shell on the execution host. You will be dropped back to the login node.

Interactive logins will provide you with resources from the “Interactive Queue”. This is limited in wall time. See https://hpc.research.uts.edu.au/status/
For testing please use a minimal test data set and minimal cores, memory and time.

Logging into a Node which is Running Your Job

In this case you don’t need to submit an interactive PBS job. Just ssh into the node. First check what node your job is running on:

$ qstat -n1
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
215426.hpcnode0 876543   smallq   primes_job  95025   1   1    5gb 00:10 R 00:00 hpcnode09/1

In this example it’s running on hpcnode09 so we can ssh directly into this node.

$ ssh hpcnode09
Last login: Wed Jul 17 16:19:13 2019 from hpcnode01
hpcnode09 $ 

Note: When your job finishes on that node your ssh connection to that node will be automatically disconnected.

If you ssh to a node, and you do not have a job running on that node, within a few seconds you will be disconected and get a “Connection to ….. closed.”

This is custom footer