Running Array Jobs¶

In this section we will look at running PBS array jobs. Array jobs are useful if you have a large number of jobs that you need to run with different, but calculatable, parameters.

Copy the Example Scripts¶

There is an example script that you can use to practice submitting a short array job in /shared/eresearch/pbs_job_examples/job_arrays/. Copy this into your own directory using the following commands:

$ cd            <-- This will take you to the top of your home directory. 
$ mkdir jobs    <-- This creates the directory "jobs".
$ cd jobs       <-- This changes into the directory "jobs".
$ cp -r /shared/eresearch/pbs_job_examples/job_arrays .   <-- Don't forget the dot at the end.
$ cd job_arrays     <-- This changes into the new "job_arrays" directory.

This will have recursively copied the directory job_arrays and its contents to your own directory. You will now be in that directory and you can have a look at the scripts there.

Submitting an Array Job¶

An an array job is submitted by using either a -J start-end:step specification to qsub or by including a #PBS -J start-end:step within your PBS submission script. In this example we will be specifying the array job specs on the qsub command line.

Submitting the script to PBS will return your PBS_JOBID for the array job. This for example will submit the array job with indices 1, 3, 5, 7, 9

$ qsub -J 1-10:2  job_array_script
28846[].hpcnode0

The “array job” will consist of “sub-jobs” and each of those will have the PBS_JOBID and a unique PBS_ARRAY_INDEX value within the brackets, for example:

28846[1].hpcnode0
28846[2].hpcnode0
28846[3].hpcnode0
28846[4].hpcnode0
28846[5].hpcnode0

When your job starts this PBS_ARRAY_INDEX value will be available within your job submission script. It’s up to you how you use it. For instance, you can use it to specify parameters to scripts or as a parameter to specify the names of your input files or name your output files.

Note

The PBS directives that specify the resources will apply to EACH individual job not all the jobs together.

Checking your Array Jobs Status¶

The status of the sub-jobs is not displayed by default. For example, the following qstat options shows the job array as a single job: qstat -a or qstat -J.

$ qstat -a
                                                        Req'd  Req'd   Elap
Job ID             Username Queue    Jobname    NDS TSK Memory Time  S Time
-----------------  -------- -------- ---------- --- --- ------ ----- - -----
230008[].hpcnode0  999777   defaultq test         1   3   16gb 00:12 Q   --

When the status (“S” column) shows “Q” then, like non-array jobs, the job is queued.
If the status (“S” column) shows “B” then this indicates that at least one sub-job has left the “Q” (queued) state and is running or has run, but not all sub-jobs have run.

To check the status of the sub-jobs, use either the -Jt option or the -t option with an array specified, for example:

$ qstat -Jt
                                                        Req'd  Req'd   Elap
Job ID             Username Queue    Jobname    NDS TSK Memory Time  S Time
-----------        -------- -------- ---------- --- --- ------ ----- - -----
230008[1].hpcnod*  999777   defaultq test         1   3   16gb 00:12 Q   -- 
230008[2].hpcnod*  999777   defaultq test         1   3   16gb 00:12 Q   -- 
230008[3].hpcnod*  999777   defaultq test         1   3   16gb 00:12 Q   --

or

$ qstat -t 230008[].hpcnode0
                                                      Req'd  Req'd   Elap
Job ID           Username Queue    Jobname    NDS TSK Memory Time  S Time
---------------  -------- -------- ---------- --- --- ------ ----- - -----
230008[].hpcnod0 119966   defaultq cooc.        1   3   16gb 00:12 Q   -- 
230008[1].hpcno* 119966   defaultq cooc.        1   3   16gb 00:12 Q   -- 
230008[1].hpcno* 119966   defaultq cooc.        1   3   16gb 00:12 Q   --

Deleting an Array Job¶

To delete an array job use the qdel command and specify the array job ID or the sub-job ID i.e.:

$ qdel 28846[].hpcnode0

or

$ qdel 28846[5].hpcnode0

Using PBS Environment Variables¶

When we are writing out files into the /scratch directory we might require a different directory or filename for each array job. We could use the PBS environment variable $PBS_JOBID to create our output files like this mkdir /scratch/your_login/$PBS_JOBID. This will give us directories like this:

/scratch/your_login/230008[1].hpcnode0
/scratch/your_login/230008[2].hpcnode0
/scratch/your_login/230008[3].hpcnode0

That is going to be a problem. We do not want the .hpcnode at the end and we definately do not want to have square brackets in the name of a directory or filename. If you have such brackets in the filename you will need to “backslash escape” the brackets whenever you wish to access the filename like this 194685\[2\].hpcnode0.

To solve this instead of using PBS_JOBID we will use PBS_ARRAY_ID and PBS_ARRAY_INDEX. (See the reference at the bottom of this page.) For the example above these would look like this for array job index 9:

$PBS_JOBID would be 20008[9].hpcnode0
$PBS_ARRAY_ID would be 20008[].hpcnode0 i.e. no index numbers in the brackets
$PBS_ARRAY_INDEX would be just 9.

This is better. We just need to remove the [].hpcnode0 from the end of the $PBS_ARRAY_ID. This can be done with the bash shells “Parameter Expansion” features. Type “man bash” and search for “Parameter Expansion”. The parameter expansion we want is ${parameter%word} where “word” will be removed from “parameter”. So ${PBS_ARRAY_ID%[].hpcnode0} will be just 230008 in this example.

Putting this all together we can do this in our submission script:

# Create directories like this: 230008_1, 230008_2 etc
mkdir /scratch/your_login/${PBS_ARRAY_ID%[].hpcnode0}_${PBS_ARRAY_INDEX}

This will give you directories like this:

/scratch/your_login/230008_1/
/scratch/your_login/230008_2/
/scratch/your_login/230008_3/ etc.

You can do a similar thing if you have all your data in one directory say /scratch/your_login/ but you need a unique filename for each array job like 230008_1.data, 230008_2.data, 230008_3.data etc.

Just use ${PBS_ARRAY_ID%[].hpcnode0}_${PBS_ARRAY_INDEX}.data for your data filenames.

References¶

PBS Environment Variables: Download the PBS Reference Guide from here:
/shared/eresearch/pbs_manuals/PBSReferenceGuide2020.1.1.pdf
Look for Section 16 “PBS Environment Variables”, page: RG-399.

Bash Parameter Expansion: Type “man bash” and search for “Parameter Expansion”.