Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel7
typeflat
separatorpipe

...

Previously we're talked about nodes having a maximum number of cores that can allocated, well they also have a maximum amount of memory that can be requested and allocated. Looking at the RAM (GB) column on the Teton OverviewBeartooth Hardware Summary-- page, you can see that the RAM available across partitions varies from 64Gb up to 1024Gb.
NOTE: Just because a node has 1024Gb please do not to try grabbing it for your job. Remember:

...


Using the mem option you can request the memory required on a node.

Options

Allocation

Comments

Code Block
#SBATCH --nodes=1
#SBATCH --mem=8G

Allocated one node requiring 8G of memory.

Remember 1G = 1032M, so you have 8192M

Code Block
#SBATCH --nodes=1
#SBATCH --mem=8

Allocated one node requiring 8M of memory.

Megabytes is the default.

Code Block
#SBATCH --nodes=3
#SBATCH --mem=32G

Allocated three nodes, each requiring 32G

Each node is allocated the same amount of memory.

Code Block
#SBATCH --nodes=1
#SBATCH --mem=132G
#SBATCH --partition=teton
Code Block
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

You can not request more memory than is actually on the node.
Teton nodes have a maximum of 128G available, and you must actually request less than that.

Code Block
#SBATCH --nodes=1
#SBATCH --mem=125G
#SBATCH --partition=teton

One node allocated requiring a 125G

On teton nodes this is the maximum that can be requested.

Code Block
#SBATCH --nodes=1
#SBATCH --mem=126G
#SBATCH --partition=teton
Code Block
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

You're trying to request too much memory.
If you really need this amount of memory either select an appropriate partition, or remove the partition option and see what you get.

Code Block
#SBATCH --nodes=1
#SBATCH --mem=102.1G
#SBATCH --partition=teton
Code Block
sbatch: error: invalid memory constraint 120.1G

You must define a whole number.

Using the seff jobid command you can check the amount of memory that was allocated / used. Running the command will display something like the following on the command line:

...

In the following examples, I am using the default ntasks-per-node value of 1:

Options

Total memory

Comments

Code Block
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=8G
#SBATCH --partition=moran

8 * 8G = 64G

Some moran nodes have 128G available. Job submitted.

Code Block
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=12G
#SBATCH --partition=moran

8 * 12G = 96G

Job submitted.

Code Block
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=16G
#SBATCH --partition=moran

8 * 16G = 128G

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available

What happened here? We requested 128G and don't the moran nodes have 128G? Your total memory allocations has to be less than what the node allows.

Options

Total memory

Comments

Code Block
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=8G
#SBATCH --partition=teton

8 * 16G = 128G

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available

Same problem as before, teton nodes have a maximum of 128G.

Code Block
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3G
#SBATCH --partition=teton

32 * 3 = 96

Job Submitted

Code Block
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3.5G
#SBATCH --partition=teton

32 * 3.5G = 112G

Code Block
sbatch: error: invalid memory constraint 3.5G

What happened here? Can't I request three and a half gigs? You can, but values have to be integer numbers, you can't define a decimal number.
What you can do is convert from G into M. But remember 1G does not equal 1000M, it actually equals 1024M. So 3.5G equals 3.5 * 1024 = 3584M.

Options

Total memory

Comments

Code Block
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3584M
#SBATCH --partition=teton

2 * 3.5G = 112G

Job Submitted

Code Block
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4000M
#SBATCH --partition=teton

less then 128G (not by much)

Job Submitted

Code Block
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4096M
#SBATCH --partition=teton

equals 128G

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available
Code Block
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4000M
#SBATCH --partition=teton

less then 128G

Job Submitted

Code Block
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4096M
#SBATCH --partition=teton

equals 128G

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available

Note: Yes, this does contradict with the mem option having to be less than 125GB. This is due to the nuances of the Slurm allocation implementation.

...

The first step in using GPUs is to understand on what Beartooth Hardware partition and type of GPU GPU hardware you want to use.
Updated: 20230206

Combinations:

Example

Partition

Comments

Code Block
#SBATCH --gres=gpu:1

blank

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available.
Code Block
#SBATCH --gres=gpu:2

blank

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available.
Code Block
#SBATCH --gres=gpu:p100:1

blank

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available.
Code Block
#SBATCH --gres=gpu:1
#SBATCH --partition=teton-gpu

teton-gpu

Allocates a single p100 on the teton-gpu partition.

Code Block
#SBATCH --gres=gpu:3
#SBATCH --partition=teton-gpu

teton-gpu

Code Block
sbatch: error: Batch job submission failed: Requested node configuration is not available

There are no nodes in the teton-gpu partition that have 5 GPUs. There are only two GPU devices available.

Code Block
#SBATCH --gres=gpu:2
#SBATCH --partition=beartooth-gpu

beartooth-gpu

Allocates two a30 devices on the beartooth-gpu partition.

Code Block
#SBATCH --partition=dgx
#SBATCH --gres=gpu:4
#SBATCH --nodelist=tdgx01

dgx

Code Block
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-b9ac5945-6494-eedd-795b-6eec42ab3e8c)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-3143b8f5-a348-cce9-4ad4-91c01618d7fd)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-5f01803f-6231-4241-41c9-8ca05dadf881)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-f1646ebd-75a2-53a9-b9df-8b7fc51fc26c)
Code Block
#SBATCH --partition=dgx
#SBATCH --gres=gpu:v100:4
#SBATCH --nodelist=mdgx01

dgx

Code Block
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-64dc6369-4c36-824d-182c-8e8f9c33f587)
GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-d4adb339-0dba-47db-e766-96b9cbc302b4)
GPU 2: Tesla V100-SXM2-16GB (UUID: GPU-f7637d82-f0c0-15e6-da23-21216b9b8f33)
GPU 3: Tesla V100-SXM2-16GB (UUID: GPU-5afe7813-6b87-b667-e0a3-8e04662357e8)

If you just want the V100s, and not concerned if it’s the 16G or 32G then you do not need to define nodelist.

Notes:

  • To request a specific type of GPU device you need to explicitly define the partition that tthat device can be found on.

  • To check your submission is GPU enabled and the type of GPU, use the $CUDA_VISIBLE_DEVICES environment variable and nvidia-smi -L within your batch script:

    • Note: This environment variable is only set submitting scripts using sbatch, it is not set via an interactive session using salloc.

...