Slurm Job Arrays
Why Use a Job Array?
If you have the same job that you want to run 10s, 100s, 1000s of times, with maybe only the initial inputs and/or setup being different across the jobs, then a Job Array allows you to submit a single job rather than submitting each one individually.
Restrictions:
(Initially) all jobs will be allocated the same job allocation of nodes, tasks, cores, memory and wall time. Although options can be modified once running, this is an advanced topic (see Slurm page above) and you might not have privileges to perform this
Job Arrays can only be used when submitted a job using
sbatch
.There is a Slurm configuration that defines the maximum size for an array, currently set to 10,000. If you require a larger limit then please contact arcchelp@uwyo.edu to discuss your requirements.
Example 01
This first example demonstrates a basic bash script that uses a job array of nine elements with each calling the same python script.
The main things to notice in this script are:
The use of
#SBATCH --array=0-8
to define the size of the job array that are indexed from 0 to 8 giving a total of nine individual jobs.Job outputs are written to a file appended with the overall parent job id (
%A
) and then the unique job array index (%a
):#SBATCH --output=arrays_ex01_%A_%a.out
The use of the
$SLURM_ARRAY_TASK_ID
environment variable to get the job array index of each specific job in the array.
run.sh
#!/bin/bash
#SBATCH --job-name arrays01
#SBATCH --time=00:01:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your-email-address>
#SBATCH --account=<your-project>
#SBATCH --output=arrays_ex01_%A_%a.out
#SBATCH --array=0-8
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_JOB_NAME:" $SLURM_JOB_NAME
echo "SLURM_JOB_NODELIST:" $SLURM_JOB_NODELIST
echo "SLURM_ARRAY_TASK_ID:" $SLURM_ARRAY_TASK_ID
module load swset/2018.05 gcc/7.3.0 python/3.6.3
python task.py $SLURM_ARRAY_TASK_ID
task.py
import sys
array_task_id = sys.argv[1]
print("Running Task: Using SLURM_ARRAY_TASK_ID: ", str(array_task_id))
Submit Job
As noted, job arrays can only be used with sbatch
, so call this script from the command line using:
sbatch run.sh
Slurm will process this call and add as many copies as defined by the array into its queue. The example below shows all nine copies of the array running at the same time:
[salexan5@tlog2 example01]$ sbatch run.sh
Submitted batch job 11949673
[salexan5@tlog2 example01]$ squeue -u salexan5
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
11949673_0 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_1 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_2 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_3 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_4 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_5 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_6 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_7 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
11949673_8 inv-arcc arrays01 salexan5 R 0:02 1 mtest2
Limit Simultaneous Jobs
If you have 1000s of jobs in the array then Slurm will add them all to the queue and try and start/allocate as many as it can as soon as it can.
For example, with the job above, if we inspect using sacct
we can see all nine were started at the same time.
Although this is how Slurm is expected to function and we want cluster usage to be as high as possible, we have to balance with a fair sharing consideration so as not to swamp the entire cluster. Slurm has some quality of service mechanisms in place, and you will experience jobs having a pending status. One thing as a user you can do is to limit the number of simultaneous jobs using the following:
Using the %
separator will limit the max number of jobs within the array that can run simultaneously. In the sample above this will limit the jobs to 3 at a time. Looking at the squeue
you would see something like the following.
Notice that over time there is never more than three of the jobs running.
And, again using sacct
, we can see that only the first three started at the same time, the rest started as a slot became available:
Identifying an Individual Array Job
In most cases you won’t want to run each of the jobs exactly the same, although you want to run the exact same simulation code you probably want to change the input data, or the initial configuration and/or setup. You could achieve this by adding some form of randomness directly into you code. If you want more control you can specifically identify the array index for a specific job using the $SLURM_ARRAY_TASK_ID
environment variable. In the above example, this will take the value of 0 or 1, or 2 … or 8.
Notice in the example above that this value is passed into the python program that is called for every job, and then this is simply printed out from within the task. Within the output you will see something like the following: Running Task: Using SLURM_ARRAY_TASK_ID: 0
where the number printed will change to the associated job index.
Logging the Output
Capturing the output of a job works similarly as with single jobs.
You can either log everything into a single output using: #SBATCH --output=arrays_ex01_%A.out
which will create a single output file called arrays_ex01_11949673.out
. The only issue with this is that each job is writing into this file and you’ll need to implement something within your code to identify from which job it’s coming from.
Alternatively, you can create an individual output for each job in the array using: #SBATCH --output=arrays_ex01_%A_%a.out
.
In this case you’d see something like the following:
Each output is identified with the parent job id followed by its specific array index.
Emailing Results
You can email the status of a job using the slurm options:
The above will email you a message when the parent job starts, finishes, is preempted etc. But it will only send a message from the perspective of the parent job, meaning you’ll only get a single message on completion once all the jobs in the array have finished.
If you want the same set of messages, but one for each array job, then add ARRAY_TASKS
:
But, be warned, if you have 100s/1000s of array jobs, you’ll get 100s/1000s of emails, so choose which ever option is most appropriate for you.
Cancelling a Job
You can cancel an entire job using: scancel 11949677
or a single job by appending the job array index: scancel 11949677_8
Example 02
This second example demonstrates just one way of using the $SLURM_ARRAY_TASK_ID
environment variable to pass more specific/tailed inputs to your task:
The parent bash script loads a file called Sample.IDs
from which it selects the line value associated with the job array index, which is then passed into the python task.
And if you look at one of the output files, you’d see something like the following:
As stated, this is just one example of how you could do this, but it hopefully demonstrates the idea.
Parent Job ID vs Array Job ID
If you look closely at the above you’ll notice that the log file was called arrays_ex02_11948266_2.out
but within the output itself SLURM_JOB_ID: 11948269
. The 11948266
is the value of the original parent job that is submitted via sbatch
, while the 11948269
is the job id for that specific array job. Each array job can still be considered as its own independent job in its own right.
This can be further highlighted by using the option: #SBATCH --output=arrays_ex02_%A_%a_%j.out
which would generate files with the following names:
Notice that each file has the same parent job id, a unique job array index, as well as it’s own unique job id.
These three options are specifically defined as:
%A
Job array's master job allocation number.%a
Job array ID (index) number.%j
jobid of the running job.
This is only an introduction to using Job Arrays, there is a lot more functionality available such as adding dependencies across jobs (having a job wait until another has completed). To explore further read the job array link at the top of this page and Slurm’s sbatch page.
If you have any questions please don’t hesitate to contact arcchelp@uwyo.edu.