Running Array Jobs¶
In this section we will look at running PBS array jobs. Array jobs are useful if you have a large number of jobs that you need to run with different, but calculatable, parameters.
Copy the Example Scripts¶
There is an example script that you can use to practice submitting a short array job
in /shared/eresearch/pbs_job_examples/job_arrays/
.
Copy this into your own directory using the following commands:
$ cd <-- This will take you to the top of your home directory.
$ mkdir jobs <-- This creates the directory "jobs".
$ cd jobs <-- This changes into the directory "jobs".
$ cp -r /shared/eresearch/pbs_job_examples/job_arrays . <-- Don't forget the dot at the end.
$ cd job_arrays <-- This changes into the new "job_arrays" directory.
This will have recursively copied the directory job_arrays
and its contents
to your own directory. You will now be in that directory and you can have a
look at the scripts there.
Submitting an Array Job¶
An an array job is submitted by using either a -J start-end:step
specification
to qsub or by including a #PBS -J start-end:step
within your PBS submission script.
In this example we will be specifying the array job specs on the qsub command line.
Submitting the script to PBS will return your PBS_JOBID
for the array job.
This for example will submit the array job with indices 1, 3, 5, 7, 9
$ qsub -J 1-10:2 job_array_script
28846[].hpcnode0
The “array job” will consist of “sub-jobs” and each of those will have the
PBS_JOBID
and a unique PBS_ARRAY_INDEX
value within the brackets, for
example:
28846[1].hpcnode0
28846[2].hpcnode0
28846[3].hpcnode0
28846[4].hpcnode0
28846[5].hpcnode0
When your job starts this PBS_ARRAY_INDEX
value will be available within your
job submission script. It’s up to you how you use it. For instance, you can use it
to specify parameters to scripts or as a parameter to specify the names of your
input files or name your output files.
Note
The PBS directives that specify the resources will apply to EACH individual job not all the jobs together.
Checking your Array Jobs Status¶
The status of the sub-jobs is not displayed by default. For example, the
following qstat options shows the job array as a single job:
qstat -a
or qstat -J
.
$ qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname NDS TSK Memory Time S Time
----------------- -------- -------- ---------- --- --- ------ ----- - -----
230008[].hpcnode0 999777 defaultq test 1 3 16gb 00:12 Q --
When the status (“S” column) shows “Q” then, like non-array jobs, the job is queued.
If the status (“S” column) shows “B” then this indicates that at least one
sub-job has left the “Q” (queued) state and is running or has run, but not all
sub-jobs have run.
To check the status of the sub-jobs, use either the -Jt
option or the -t
option
with an array specified, for example:
$ qstat -Jt
Req'd Req'd Elap
Job ID Username Queue Jobname NDS TSK Memory Time S Time
----------- -------- -------- ---------- --- --- ------ ----- - -----
230008[1].hpcnod* 999777 defaultq test 1 3 16gb 00:12 Q --
230008[2].hpcnod* 999777 defaultq test 1 3 16gb 00:12 Q --
230008[3].hpcnod* 999777 defaultq test 1 3 16gb 00:12 Q --
or
$ qstat -t 230008[].hpcnode0
Req'd Req'd Elap
Job ID Username Queue Jobname NDS TSK Memory Time S Time
--------------- -------- -------- ---------- --- --- ------ ----- - -----
230008[].hpcnod0 119966 defaultq cooc. 1 3 16gb 00:12 Q --
230008[1].hpcno* 119966 defaultq cooc. 1 3 16gb 00:12 Q --
230008[1].hpcno* 119966 defaultq cooc. 1 3 16gb 00:12 Q --
Deleting an Array Job¶
To delete an array job use the qdel command and specify the array job ID or the sub-job ID i.e.:
$ qdel 28846[].hpcnode0
or
$ qdel 28846[5].hpcnode0
Using PBS Environment Variables¶
When we are writing out files into the /scratch directory we might require a
different directory or filename for each array job.
We could use the PBS environment variable $PBS_JOBID
to create our output files like this
mkdir /scratch/your_login/$PBS_JOBID
. This will give us directories like this:
/scratch/your_login/230008[1].hpcnode0
/scratch/your_login/230008[2].hpcnode0
/scratch/your_login/230008[3].hpcnode0
That is going to be a problem. We do not want the .hpcnode
at the end and we
definately do not want to have square brackets in the name of a directory or filename.
If you have such brackets in the filename you will to “backslash escape” the
brackets whenever you wish to access the filename like this 194685\[2\].hpcnode0
.
To solve this instead of using PBS_JOBID
we will use PBS_ARRAY_ID
and PBS_ARRAY_INDEX
.
(See the reference at the bottom of this page.)
For the example above these would look like this for array job index 9:
$PBS_JOBID
would be 20008[9].hpcnode0
$PBS_ARRAY_ID
would be 20008[].hpcnode0
i.e. no index numbers in the brackets
$PBS_ARRAY_INDEX
would be just 9.
This is better. We just need to remove the [].hpcnode0
from the end of the $PBS_ARRAY_ID
.
This can be done with the bash shells “Parameter Expansion” features.
Type “man bash” and search for “Parameter Expansion”.
The parameter expansion we want is ${parameter%word}
where “word” will be
removed from “parameter”.
So ${PBS_ARRAY_ID%[].hpcnode0}
will be just 230008 in this example.
Putting this all together we can do this in our submission script:
# Create directories like this: 230008_1, 230008_2 etc
mkdir /scratch/your_login/${PBS_ARRAY_ID%[].hpcnode0}_${PBS_ARRAY_INDEX}
This will give you directories like this:
/scratch/your_login/230008_1/
/scratch/your_login/230008_2/
/scratch/your_login/230008_3/
etc.
You can do a similar thing if you have all your data in one directory say /scratch/your_login/
but you need a unique filename for each array job like
230008_1.data
, 230008_2.data
, 230008_3.data
etc.
Just use ${PBS_ARRAY_ID%[].hpcnode0}_${PBS_ARRAY_INDEX}.data
for your data filenames.
References¶
PBS Environment Variables:
Download the PBS Reference Guide from here:
/shared/eresearch/pbs_manuals/PBSReferenceGuide2020.1.1.pdf
Look for Section 16 “PBS Environment Variables”, page: RG-399.
Bash Parameter Expansion: Type “man bash” and search for “Parameter Expansion”.