Introduction

The Slurm page introduces the basics of creating a batch script that is used on the command line with the sbatch command to submit and request a job on the cluster. This page is an extension that goes into a little more detail focusing on the use of the following slurm options:

  1. mem

  2. mem-per-cpu

  3. gres: used to request GPUs.

A complete list of options can be found on the Slurm: sbatch manual page or by typing man sbatch from the command line when logged onto teton.

Aims: The aims of this page are to extend the user's knowledge:

Note:

Please share with ARCC your experiences of various configurations for whatever application you use so we can share it with the wider UW (and beyond) research community.

Prerequisites: You:

Memory Allocation

Previously we're talked about nodes having a maximum number of cores that can allocated, well they also have a maximum amount of memory that can be requested and allocated. Looking at the RAM (GB) column on the Beartooth Hardware Summary-- page, you can see that the RAM available across partitions varies from 64Gb up to 1024Gb.
NOTE: Just because a node has 1024Gb please do not to try grabbing it for your job. Remember:


Using the mem option you can request the memory required on a node.

Options

Allocation

Comments

#SBATCH --nodes=1
#SBATCH --mem=8G

Allocated one node requiring 8G of memory.

Remember 1G = 1032M, so you have 8192M

#SBATCH --nodes=1
#SBATCH --mem=8

Allocated one node requiring 8M of memory.

Megabytes is the default.

#SBATCH --nodes=3
#SBATCH --mem=32G

Allocated three nodes, each requiring 32G

Each node is allocated the same amount of memory.

#SBATCH --nodes=1
#SBATCH --mem=132G
#SBATCH --partition=teton
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

You can not request more memory than is actually on the node.
Teton nodes have a maximum of 128G available, and you must actually request less than that.

#SBATCH --nodes=1
#SBATCH --mem=125G
#SBATCH --partition=teton

One node allocated requiring a 125G

On teton nodes this is the maximum that can be requested.

#SBATCH --nodes=1
#SBATCH --mem=126G
#SBATCH --partition=teton
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

You're trying to request too much memory.
If you really need this amount of memory either select an appropriate partition, or remove the partition option and see what you get.

#SBATCH --nodes=1
#SBATCH --mem=102.1G
#SBATCH --partition=teton
sbatch: error: invalid memory constraint 120.1G

You must define a whole number.

Using the seff jobid command you can check the amount of memory that was allocated / used. Running the command will display something like the following on the command line:

Job ID: jobid
Cluster: teton
User/Group: userid/groupid
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:01
CPU Efficiency: 50.00% of 00:00:02 core-walltime
Job Wall-clock time: 00:00:02
Memory Utilized: 3.89 MB
Memory Efficiency: 48.58% of 8.00 MB
...
Memory Efficiency: 0.03% of 8.00 GB


Using the mem-per-cpu option you can request that each cpu has this amount of memory available to it.

Remember, that you need to check the overall total amount of memory you're trying to allocate on a node. So calculate the total number of cores you're requesting for a node (ntasks-per-node * cpus-per-task and then multiple that by mem-per-cpu.

In the following examples, I am using the default ntasks-per-node value of 1:

Options

Total memory

Comments

#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=8G
#SBATCH --partition=moran

8 * 8G = 64G

Some moran nodes have 128G available. Job submitted.

#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=12G
#SBATCH --partition=moran

8 * 12G = 96G

Job submitted.

#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=16G
#SBATCH --partition=moran

8 * 16G = 128G

sbatch: error: Batch job submission failed: Requested node configuration is not available

What happened here? We requested 128G and don't the moran nodes have 128G? Your total memory allocations has to be less than what the node allows.

Options

Total memory

Comments

#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=8G
#SBATCH --partition=teton

8 * 16G = 128G

sbatch: error: Batch job submission failed: Requested node configuration is not available

Same problem as before, teton nodes have a maximum of 128G.

#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3G
#SBATCH --partition=teton

32 * 3 = 96

Job Submitted

#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3.5G
#SBATCH --partition=teton

32 * 3.5G = 112G

sbatch: error: invalid memory constraint 3.5G

What happened here? Can't I request three and a half gigs? You can, but values have to be integer numbers, you can't define a decimal number.
What you can do is convert from G into M. But remember 1G does not equal 1000M, it actually equals 1024M. So 3.5G equals 3.5 * 1024 = 3584M.

Options

Total memory

Comments

#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3584M
#SBATCH --partition=teton

2 * 3.5G = 112G

Job Submitted

#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4000M
#SBATCH --partition=teton

less then 128G (not by much)

Job Submitted

#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4096M
#SBATCH --partition=teton

equals 128G

sbatch: error: Batch job submission failed: Requested node configuration is not available
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4000M
#SBATCH --partition=teton

less then 128G

Job Submitted

#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4096M
#SBATCH --partition=teton

equals 128G

sbatch: error: Batch job submission failed: Requested node configuration is not available

Note: Yes, this does contradict with the mem option having to be less than 125GB. This is due to the nuances of the Slurm allocation implementation.

Some Considerations

Shouldn't I always just request the best nodes? Consider the following: Teton nodes have 32 cores and a maximum of 128G, thus if you wanted to exclusively use the node and all the cores, the maximum you could request for each core is 4G. In comparison, the moran nodes (with 128G) have 16 cores, and can thus allocate a higher maximum of 8G. Maybe you could request two moran nodes (total of 32 cores) with each core having 8G, rather than a single teton node with each core having a lower 4G. This is a slightly contrived example, but hopefully it will get you thinking that popular better nodes are not always the best option. You job might actually get allocated resources quick rather than being queued.

Out-of-Memory Errors: Although you can allocated 'appropriate' resources, there is nothing stopping the actual application (behind the scenes so to speak) from trying to allocate and use more. So, in some cases the actual application will try to use more memory than is available on the node, and cause an out-of-memory error. Check the job .out/.err files for a message of the form:

slurmstepd: error: Detected 2 oom-kill event(s) in step 3280189.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: m121: task 32: Out Of Memory
srun: Terminating job step 3280189.0

For commercial applications there's nothing we can directly do, and even for open source software trying to track down the memory leak can be very time consuming.

Can we predict if this is going to happen? At this moment in time, no. But we can suggest that you:

Requesting GPUs

The first step in using GPUs is to understand on what Beartooth Hardware partition and type of GPU GPU hardware you want to use.
Updated: 20230206

Combinations:

Example

Partition

Comments

#SBATCH --gres=gpu:1

blank

sbatch: error: Batch job submission failed: Requested node configuration is not available.
#SBATCH --gres=gpu:2

blank

sbatch: error: Batch job submission failed: Requested node configuration is not available.
#SBATCH --gres=gpu:p100:1

blank

sbatch: error: Batch job submission failed: Requested node configuration is not available.
#SBATCH --gres=gpu:1
#SBATCH --partition=teton-gpu

teton-gpu

Allocates a single p100 on the teton-gpu partition.

#SBATCH --gres=gpu:3
#SBATCH --partition=teton-gpu

teton-gpu

sbatch: error: Batch job submission failed: Requested node configuration is not available

There are no nodes in the teton-gpu partition that have 5 GPUs. There are only two GPU devices available.

#SBATCH --gres=gpu:2
#SBATCH --partition=beartooth-gpu

beartooth-gpu

Allocates two a30 devices on the beartooth-gpu partition.

#SBATCH --partition=dgx
#SBATCH --gres=gpu:4
#SBATCH --nodelist=tdgx01

dgx

GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-b9ac5945-6494-eedd-795b-6eec42ab3e8c)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-3143b8f5-a348-cce9-4ad4-91c01618d7fd)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-5f01803f-6231-4241-41c9-8ca05dadf881)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-f1646ebd-75a2-53a9-b9df-8b7fc51fc26c)
#SBATCH --partition=dgx
#SBATCH --gres=gpu:v100:4
#SBATCH --nodelist=mdgx01

dgx

GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-64dc6369-4c36-824d-182c-8e8f9c33f587)
GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-d4adb339-0dba-47db-e766-96b9cbc302b4)
GPU 2: Tesla V100-SXM2-16GB (UUID: GPU-f7637d82-f0c0-15e6-da23-21216b9b8f33)
GPU 3: Tesla V100-SXM2-16GB (UUID: GPU-5afe7813-6b87-b667-e0a3-8e04662357e8)

If you just want the V100s, and not concerned if it’s the 16G or 32G then you do not need to define nodelist.

Notes:

echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES
nvidia-smi -L

If you request more than one GPU, you'll get something of the form:

CUDA_VISIBLE_DEVICES: 0,1
GPU 0: NVIDIA A30 (UUID: GPU-b9614d02-bcc7-e75c-4c9c-ba3515f8c082)
GPU 1: NVIDIA A30 (UUID: GPU-4b8746a4-4f7f-93dc-0cd5-a8b166100bbd)

If no value appears for the environment variable, then this means GPU is not enabled.

Interactive Jobs:

You can request GPUs via an interactive job:

salloc --account=<account> --time=01:00:00 -N 1 -c 1 --partition=teton-gpu --gres=gpu:1