Introduction

The Slurm page introduces the basics of creating a batch script that is used on the command line with the sbatch command to submit and request a job on the cluster. This page is an extension that goes into a little more detail focusing on the use of the following four options:

  1. nodes

  2. ntasks-per-node

  3. cpus-per-task

  4. ntasks

and how they can be used and combined to request specific configurations of nodes, tasks and cores.

A complete list of options can be found on the Slurm: sbatch manual page or by typing man sbatch from the command line when logged onto teton.

Aims: The aims of this page are to:


Note:

Please share with ARCC your experiences of various configurations for whatever application you use so we can share it with the wider UW (and beyond) research community.

Prerequisites: You:

Resources

UW's HPC cluster is made up of over 500 nodes as described on the Teton Overview page, divided up into common hardware configurations using partitions. As this is an introduction, we will only be demonstrating examples using the moran (16 cores) and teton (32 cores) partitions. When you submit a job with no partition defined, Slurm will first try and allocate using the resources available on moran and then teton.

Diagram Key: Within the diagrams that follow:

Node: A node is made up of cores, and depending on the type of node it might have 16, 32, 40 or even more cores. Any allocated node will by default have a single task.

Task: A task runs on a single node. You can have multiple tasks running on a single node. You can not have a single task running over multiple nodes. Tasks by default will be allocated one core, but depending on your options it can have multiple cores. All the cores associated with a task will be demonstrated by being enclosed within the task's black boundary.

Core: A single core. Nodes are made up of cores. If a node is made up of 16 cores, then there will be 16 core icons within the node.

So, in the following diagram:

We have a single node made up of sixteen cores. There is one task running on that node, with that task using one core.

As an aside, what this also means is that there are fifteen cores not being used which can be allocated to other jobs.

Nodes

If you do not use any of the four options, by default Slurm will allocate a single node, with a single task, using a single core. This is mimicked using the following:

#SBATCH --nodes=1

If you require more nodes, for example four, then use:

#SBATCH --nodes=4

Note, that although we have allocated four nodes, each node is still only running a single task using a single core.

Tasks

In the last example we can see that running four nodes, each with one task (using one core) is not the most efficient use of resources.
If you require multiple tasks, or maybe your application requires tasks to be grouped on a single node (there are lots of potential scenarios) then you can use the ntasks-per-node option.
By default, only one task is allocated per node:

#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=1

If you require say four tasks, use:

#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=4

Or maybe 16 tasks, use:

#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=16


But what if you require more than 16 tasks on an individual node? Say 17?

#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=17


Although there are only 16 cores on the moran nodes, slurm will automatically try to allocate your job on the teton nodes that have 32 cores.

What happens if I require 33 tasks on a single node? Say 33?

#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=33


Without going into detail as this is only an introduction, slurm is aware of other hardware configurations across the cluster and will do it's best to allocate you job on nodes that can accommodate your request.
But, if you get the following error message on the command line:

sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

This means you are asking for too many cores (per node) on the partition you are using. To solve this you need to reduce the overall number of cores being allocated per individual node.
The following will cause such an error as you are trying to allocate 17 tasks (each using a single core) on a partition that only has nodes with 16 cores.

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=17
#SBATCH --partition=moran

Cores per Node/Task

By default a node will be allocated a single task using a single core. Depending on how your application is parallelised it might be able to use multiple cores per task. The cpus-per-task option allows you to define the number of cores that will be allocated to each task.

#SBATCH --nodes=1
#SBATCH --cpus-per-task=4

Remember, that by default, a single task is allocated per node. This is the same as the following:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4

The following is an example of requesting a single node, but running two tasks, with each task using four cores. In total the node will use 2 * 4 = 8 cores.

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=4

A node has a maximum number of cores that can be allocated (moran nodes have 16, teton nodes have 32) and you can not request more than that maximum number of cores. Within your options, the value of ntasks-per-node * cpus-per-task can not exceed the maximum number of cores for the type of node you are requesting. If you specifically request the teton partition where each node has 32 cores, you could request:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=5
#SBATCH --cpus-per-task=6
#SBATCH --partition=teton

ntasks-per-node (=5) * cpus-per-task (=6) : total 30 cores which is less than the maximum of 32.

But, if you tried the following:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=6
#SBATCH --partition=teton

The job submission would fail with the following:

sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

This is because, ntasks-per-node (=8) * cpus-per-task (=6) : total 48 cores which is more than is available on that type of node.

Nodes, Tasks and Cores

Again, depending on how your application is parallelised, you can request multiple nodes, running multiple tasks, each using multiple cores.
This first example illustrates requesting two nodes, with each node running two tasks, with each task using three cores. So, a total of six cores on each node, and an overall total of twelve cores for your job.

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=3

This second example illustrates requesting two nodes, with each node running three tasks, with each task using four cores. So, a total of twelve cores on each node, and an overall total of twenty four cores for your job.

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=3
#SBATCH --cpus-per-task=4

Note: For a job, each node will have the same configuration of tasks and cores. You can not request different tasks/core configurations across nodes within a specific job.
There are always different configurations to request the same overall total number of cores for a job. For example, the following two configurations both use a total of 30 cores:

#SBATCH --nodes=3
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=5

3 nodes * 2 tasks per node * 5 cores per task = 30

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=3
#SBATCH --cpus-per-task=5

2 nodes * 3 tasks per node * 5 cores per task = 30


Which To Use? Well, there is no right or wrong answer:

But ARCC is here to help and work with you to come up with the best configuration.

Not Defining Nodes: ntasks

If you do not need to explicitly request a specific number of nodes, you can use the ntasks option. This will try and allocate (depending on the current resources available at that time) an appropriate number of nodes that your configuration can fit onto.
This first example, explicitly using the moran partition, requests 16 tasks (using by default one core per task). We can allocate this onto a single node.

#SBATCH --ntasks=16
#SBATCH --partition=moran


If we require 17 tasks, that will not fit on a single (16 core) node, then these will be allocated across two nodes, the first with 16 tasks, the second with 1.

#SBATCH --ntasks=17

If we require 24 tasks then these again will be allocated across two nodes, the first with 16 tasks, the second with 8.

#SBATCH --ntasks=24


If we asked for 40 tasks then we'd get three nodes (16 + 16 + 8).
If we had instead defined the teton partition, then the above three examples of 16, 17 and 24 tasks would all be allocated across a single teton node that is able to accommodate a total of 32 cores.

ntasks and cpus

We can also combine the ntasks and cpus-per-task options together:
Since we are explicitly requesting the moran partition with each node having 16 cores, we know we can fit one task (using 16 cores) on a single node, so three nodes will be allocated.

#SBATCH --ntasks=3
#SBATCH --cpus-per-task=16
#SBATCH --partition=moran

Now that we only require eight cores per task, we can now fit two tasks per node, so only two nodes are required to accommodate our allocation.

#SBATCH --ntasks=3
#SBATCH --cpus-per-task=8
#SBATCH --partition=moran

Finally, now that we asking to use teton nodes (32 cores per node) which can fit all three tasks onto a single node.

#SBATCH --ntasks=3
#SBATCH --cpus-per-task=8
#SBATCH --partition=teton

Notice that in the previous three examples we did not define the nodes option. The scheduler will automatically try and allocate the appropriate amount of nodes that our required configuration can fit across.

nodes and ntasks

Although there is nothing wrong using nodes and ntasks options together, ideally you'd use one or the other. So a common question is which option to use? Again this depends on you requirements, but here are some final examples to illustrate the differences:
The first shows five nodes, each running a single task, with each task using four cores.

#SBATCH --nodes=5
#SBATCH --cpus-per-task=4

The second illustrates using the ntasks option which still allocates five tasks each with four cores, but now distributed across only two nodes.

#SBATCH --ntasks=5
#SBATCH --cpus-per-task=4

If you use nodes and ntasks together, then you will only get the number of nodes required to fulfill the number of tasks. So, although we've asked for five nodes, we've only asked for four ntasks. The job will thus only be allocated four (not five) nodes.

#SBATCH --nodes=5
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=4

Slurm will notify you of such cases with the following warning message:

sbatch: Warning: can't run 4 processes on 5 nodes, setting nnodes to 4


Finally, if we hadn't requested any nodes, and only asked for four tasks, then this can actually fit on a single node.

#SBATCH --ntasks=4
#SBATCH --cpus-per-task=4

Note:
Slurm will select the best allocation across the cluster for a submitted job with respect to the resources available at the time. So, depending on its current load, some of the allocations with respect to using ntasks might have different configurations than represented in the previous diagrams. If for some reason you specifically need a certain amount of nodes then use the nodes option.
Do not expect an even distribution of tasks across nodes. For example:

#SBATCH --nodes=10
#SBATCH --ntasks=100

This will not evenly distribute ten tasks per node. Instead, if using the moran partition you will likely get a distribution of 16, 16, 16, 16, 16, 16, 1, 1, 1, 1 (total of 100), or on the teton partition a distribution of 32, 32, 32, 1, 1, 1, 1, 1, 1, 1 (total of 100). So, although you get ten nodes allocated, the number of tasks on each node is not the same. This can significantly effect the amount of memory being used on a node.
If you require an even distribution then use:

#SBATCH --nodes=10
#SBATCH --ntasks-per-node=10

Shortcuts

Many of the slurm options have shortcuts:

Here is a comparison of two requests that are asking for the same allocation, the one on the left using the standard options, the one on the right using shortcuts. Notice that shortcuts do not use an equals sign character '=' between the character flag and define number, also the shortcut is preceded by only a single dash character '-', not two '--'.

#SBATCH --nodes=5
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=4
#SBATCH --partition=teton
#SBATCH -N 5
#SBATCH -c 4
#SBATCH -n 4
#SBATCH -p teton

How Many Cores and/or Memory Should I Request?

You will need to perform/track analysis to understand what works for your data/analysis. Do not just use a hugemem node!

Summary

In this introduction we've looked at using the four sbatch options nodesntasks-per-nodecpus-per-task and ntasks and various combinations of them.

There are no hard and fast rules, but we would recommend:


Finally: We welcome feedback, and if anything isn't clear, or something is missing, or in fact you think there is a mistake, please don't hesitate to contact us.