Generally you cannot directly login via ssh to the other nodes in the HPC cluster. However you can login to any node by submitting an interactive PBS job or you can ssh to any node on which you have a job running. The use case is that you might need to login to the node running your job to debug a problem with your PBS job or you might need to login to a GPU node to compile your code if it uses CUDA. See also Accessing GPU Nodes. Interactive jobs are not for running long computations. Hence the maximum walltime is 8 hours.
Submitting an interactive PBS job will allow you to login into any node for a specified period of time. The example below gives you a login on host hpccnode07 for 30 minutes. Use the “-I” option and select the host and wall time. The “-I” option will start a PBS job in the interactive queue. You will get 1 CPU core and 5GB of RAM as that is the default. There is also a default walltime which is 1 hour.
$ qsub -I -l select=1:host=hpccnode07 -l walltime=00:30:00 hpccnode07 $
For some nodes which have restricted queues you will also need to add the queue name to the command.
$ qsub -I -l select=1:host=c3node03 -l walltime=00:30:00 -q c3b c3node03 $
exit to exit the interactive shell on the execution host. You will be
dropped back to the login node.
Interactive logins will provide you with resources from the “Interactive Queue”.
This is limited in wall time. See https://hpc.research.uts.edu.au/status/
For testing please use a minimal test data set and minimal cores, memory and time.
In this case you don’t need to submit an interactive PBS job. Just ssh into the node. First check what node your job is running on:
$ qstat -n1 Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 215426.hpcnode0 876543 smallq primes_job 95025 1 1 5gb 00:10 R 00:00 hpccnode09/1
In this example it’s running on
hpccnode09 so we can ssh directly into this node.
$ ssh hpccnode09 Last login: Wed Jul 17 16:19:13 2019 from hpccnode01 hpccnode09 $
Note: When your job finishes on that node your ssh connection to that node will be automatically disconnected.
If you ssh to a node, and you do not have a job running on that node, within a few seconds you will be disconected and get a “Connection to ….. closed.”