Slurm: Jobs within a Job
Overview
Typically when a user submits a job, then this job is a self-contained single job. Its starts, it runs, it finishes.
But, depending on your use-case you can run other child jobs, in parallel, within this parent job.
There are a number of ways this can be performed. For example, if all the child jobs use exactly the same script but with different inputs, then using Slurm Job Arrays might offer a solution.
Example
Here is an example of launching two jobs, in parallel, each using a GPU, from a single parent job.
Scenario: User submits a single job, in which they want to run two (independent) scripts in parallel. They want to request two GPUs overall, and have the two scripts each use one GPU. This example only uses a single node.
Considerations:
Although we have only a single job being submitted, we actually have three tasks going on. First the parent job that is submitted, and then two additional tasks, one each for the two parallel scripts that will be called.
We want the parent job to request all the required resources (including the GPUs) and then have the two child scripts use their own subset of these resources.
These scripts will be running independently and will require their own tasks/cores/memory and GPU. They are not sharing resources across them.
First Step: Create Parent Submission
Script: run.sh
#!/bin/bash
#SBATCH --account=<your-project>
#SBATCH --time=00:5:00
#SBATCH --job-name=multi_job_gpu
#SBATCH --nodes=1
# We need to request 3 tasks: One for the parent and two for the children.
#SBATCH --ntasks=3
# We only need to request two GPUs since the parent does not explicitly use them, only the two children tasks.
#SBATCH --gres=gpu:2
#SBATCH --output=multi_job_gpu_%A.log
# Lets see what the parent job has been allocated.
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES
echo "Check GPU Allocation:"
nvidia-smi -L
echo "- - - - - - - - - - - - - - - -"
# Each step job will use one of the three overall tasks allocated, and one of the GPUs.
# You have to explicitly request one of the requested GPUs, they will not be automatically allocated to the step jobs.
srun --ntasks=1 --gres=gpu:1 --exclusive -u ./gpu_check.sh &
srun --ntasks=1 --gres=gpu:1 --exclusive -u ./gpu_check.sh &
# From: https://slurm.schedmd.com/srun.html
# --exclusive: This option can also be used when initiating more than one job step within an existing resource allocation (default),
# where you want separate processors to be dedicated to each job step. If sufficient processors are not available to initiate the job
# step, it will be deferred. This can be thought of as providing a mechanism for resource management to the job within its allocation.
# -u: By default, the connection between slurmstepd and the user-launched application is over a pipe. The stdio output written by the
# application is buffered by the glibc until it is flushed or the output is set as unbuffered. If this option is specified the tasks are
# executed with a pseudo terminal so that the application output is unbuffered. This option applies to step allocations.
wait
echo "Done"
The dummy script that is called for each step job basically queries which GPU has been allocated to it.
#!/bin/bash
echo $SLURM_STEPID": SLURM_JOB_ID:" $SLURM_JOB_ID
echo $SLURM_STEPID": SLURM_STEPID:" $SLURM_STEPID
echo $SLURM_STEPID": CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES
echo $SLURM_STEPID": Check GPU Allocation:"
echo $SLURM_STEPID":" $(nvidia-smi -L)
echo $SLURM_STEPID": Sleep for 10 seconds:"
sleep 10
echo $SLURM_STEPID": Done"
Run the Script:
[]$ sbatch run.sh
Submitted batch job 2338899
Example Output:
Using the -u
option, we are getting the output from the three tasks (parent and two step jobs) being written to the single output file when messages occur.
The numbers on the left hand side represent the step ID for each of the two job steps, and help us to see that we have three tasks running, and what is being outputted by which one.
From the above, we can see that the parent job requests, and is allocated, two GPUs (remember, this is two GPUs on a single node.)
Notice all three tasks have the same job ID, but the two child step jobs each have a unique job step ID defined (0 and 1) with each separately using one of the two allocated GPUs.
Running in Parallel
We can confirm the job steps are running in parallel by using the sacct
command:
Notice that the two job steps start and end at the same time.
Error Message
If your parent job has not requested enough overall resources for your particular use-case, look out for messages of the following form:
Note: This is just one example of many potential use-cases, but demonstrates the basics from which you can develop your own scripts from.