Run a Job: Batch Script Basics and Common Terms
SLURM
At ARCC, we use the SLURM job scheduler for cluster/resource management, and job scheduling. SLURM is responsible for allocating resources to users, and provides a framework for users to start, execute and monitor work on requested and allocated resources. It also allows users to schedule work for execution at a later time.
To learn more about SLURM, check out their documentation at: https://slurm.schedmd.com/documentation.html
Jobs
A job is an allocation of resources like compute nodes, GPUs, or cores that get assigned to a user for a specific amount of time. Jobs may be interactive or submitted as a batch script for a later scheduled execution.
When a job gets assigned to a specific set of hardware (this may be a collection of nodes, cores, GPUs, etc.) the job can specify commands to initiate parallel work in the form of job steps based on an configuration within their allocated hardware.
Batch Scripts
Batch scripts are used to submit jobs to the cluster with the sbatch command. Your batch script will likely contain one or more srun commands to launch parallel tasks. Examples for running different batch scripts may be found here.
The Anatomy of a Batch Script
Below is an example job script with common commands. What follows is a breakdown of each line and corresponding directives within the script:
#!/bin/bash
#SBATCH --account=arcc
#SBATCH --qos=debug
#SBATCH --time=0-00:10:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=2G
#SBATCH --mail-user=cowboyjoe@uwyo.edu
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --get-user-env
export OMP_PROC_BIND=true
export OMP_PLACES=threads
export OMP_NUM_THREADS=32
module load miniconda3
module load gcc/14.2.0
srun echo "Start Job Process"
srun hostname
srun sleep 30
srun --cpu-bind=cores check-hybrid.gnu.pm
srun echo "End Job Process"
Directive Type | Subdirective / Command | What it Does | Corresponding Line #(s) | Ex of Output & How it’s Provided |
---|---|---|---|---|
Shebang: |
| Used to tell the cluster system to use the bash interpreter at the path /bin/bash on the system. | 1 | n/a |
Sbatch: |
| Request your resources with SBATCH directives. SBATCH directives are always preceded with a # sign and should be in uppercase format. | 3-12 |
|
|
| Tells the system which account/project you’re running your job under. In the example, the job is run associated with the arcc project. If arcc were not a valid project or account, or the submitter isn’t a member of the project/account, the job will not run. Account is a mandatory directive that must be included within your submission script. Jobs that don’t specify a time or a qos won’t run. | 3 |
|
|
| Tells SLURM to submit the job to the debug qos/queue. | 4 |
|
|
| The example requests a job with a time limit of 0 days - 00 hours : 10 minutes : 00 seconds. | 5 |
|
|
| Tells SLURM the entire job will run on a single node on the cluster | 6 |
|
|
| Tells SLURM that we will be running our job across 24 tasks at a time | 7 |
|
|
| Tells SLURM we will need 4 CPUs to running each task | 8 |
|
|
| Tells SLURM that each task running on 4 CPUs will also be allocated 2GB of RAM | 9 |
|
|
| Tells SLURM to e-mail the specified user (cowboyjoe@uwyo.edu) for all events specified under | 10 |
|
|
| Tells SLURM to e-mail the above e-mail address during the following events: job start ( | 11 |
|
|
| Tells SLURM to get the login environment variables. Users should be aware that any environment variables already set by #SBATCH will take precedent over the local user’s login environment. Users should clear environment variables before calling #SBATCH directives if they are not to be applied to the spawned program. | 12 |
|
Open MP: |
| Specifies threads and how they’re set up/distributed using OpenMP Framework | 14-16 |
|
|
| Specifies the number of OMP threads for the job (Value should equal --ntasks * --cpus-per-task specified above in SBATCH directives. Ex: 24 * 4 = 96). | 14 |
|
|
| Enables thread binding using OpenMP and prints out information about thread affinity to the start of the job output file | 15 |
|
|
| Bind each thread to a core/cpu. | 16 |
|
Module(s): |
| Load any required software | 18-19 |
|
|
| Load miniconda3 software to use during the course of the job | 18 |
|
|
| Load gcc compiler to use during the course of the job, specifically version 14.2.0 | 19 |
|
Slurm Run: |
| Executes whatever follows |
|
|
|
| Print “Start Job Process” to the terminal. |
|
|
|
| Print the hostname of the node we’re running on to the terminal. Don’t do next commands until this finishes. |
|
|
|
| Stop executing things for 30. Don’t do next commands until this finishes. |
|
|
|
| The executable |
|
|
|
| Print “End Job Process” to the terminal. Don’t do next commands until this finishes. |
|
|
What’s the difference between sbatch and srun?
Both sbatch
and srun
are both SLURM commands that accept similar parameters so it’s easy to be confused by how each should be used.
The main difference is srun
is interactive and blocking. This means you get your results in your terminal and you cannot execute other things in your terminal until the srun command running is finished. sbatch
is batch processing and non-blocking, so it’s run is submitted to queue, and when pulled out of queue to be run, the results get written to a file, and you’re able to submit new/other commands right away.
Another difference between srun
and sbatch
is that sbatch allows you to run job arrays while srun does not. Additionally, srun
can be and often is run from within an sbatch
script.