Table of Contents |
---|
minLevel | 1 |
---|
maxLevel | 7 |
---|
type | flat |
---|
separator | pipe |
---|
|
...
Previously we're talked about nodes having a maximum number of cores that can allocated, well they also have a maximum amount of memory that can be requested and allocated. Looking at the RAM (GB) column on the Teton OverviewBeartooth Hardware Summary-- page, you can see that the RAM available across partitions varies from 64Gb up to 1024Gb.
NOTE: Just because a node has 1024Gb please do not to try grabbing it for your job. Remember:
...
Using the mem
option you can request the memory required on a node.
Options | Allocation | Comments |
---|
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --mem=8G |
| Allocated one node requiring 8G of memory. | Remember 1G = 1032M, so you have 8192M |
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --mem=8 |
| Allocated one node requiring 8M of memory. | Megabytes is the default. |
Code Block |
---|
#SBATCH --nodes=3
#SBATCH --mem=32G |
| Allocated three nodes, each requiring 32G | Each node is allocated the same amount of memory. |
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --mem=132G
#SBATCH --partition=teton |
| Code Block |
---|
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
|
| You can not request more memory than is actually on the node. Teton nodes have a maximum of 128G available, and you must actually request less than that. |
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --mem=125G
#SBATCH --partition=teton |
| One node allocated requiring a 125G | On teton nodes this is the maximum that can be requested. |
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --mem=126G
#SBATCH --partition=teton |
| Code Block |
---|
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
|
| You're trying to request too much memory. If you really need this amount of memory either select an appropriate partition, or remove the partition option and see what you get. |
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --mem=102.1G
#SBATCH --partition=teton |
| Code Block |
---|
sbatch: error: invalid memory constraint 120.1G
|
| You must define a whole number. |
Using the seff jobid
command you can check the amount of memory that was allocated / used. Running the command will display something like the following on the command line:
...
In the following examples, I am using the default ntasks-per-node
value of 1:
Options | Total memory | Comments |
---|
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=8G
#SBATCH --partition=moran |
| 8 * 8G = 64G | Some moran nodes have 128G available. Job submitted. |
Code Block |
---|
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=12G
#SBATCH --partition=moran |
| 8 * 12G = 96G | Job submitted. |
Code Block |
---|
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=16G
#SBATCH --partition=moran |
| 8 * 16G = 128G | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available |
|
What happened here? We requested 128G and don't the moran nodes have 128G? Your total memory allocations has to be less than what the node allows.
Options | Total memory | Comments |
---|
Code Block |
---|
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=8G
#SBATCH --partition=teton |
| 8 * 16G = 128G | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available |
Same problem as before, teton nodes have a maximum of 128G. |
Code Block |
---|
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3G
#SBATCH --partition=teton |
| 32 * 3 = 96 | Job Submitted |
Code Block |
---|
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3.5G
#SBATCH --partition=teton |
| 32 * 3.5G = 112G | Code Block |
---|
sbatch: error: invalid memory constraint 3.5G |
|
What happened here? Can't I request three and a half gigs? You can, but values have to be integer numbers, you can't define a decimal number.
What you can do is convert from G into M. But remember 1G does not equal 1000M, it actually equals 1024M. So 3.5G equals 3.5 * 1024 = 3584M.
Options | Total memory | Comments |
---|
Code Block |
---|
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=3584M
#SBATCH --partition=teton |
| 2 * 3.5G = 112G | Job Submitted |
Code Block |
---|
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4000M
#SBATCH --partition=teton |
| less then 128G (not by much) | Job Submitted |
Code Block |
---|
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4096M
#SBATCH --partition=teton |
| equals 128G | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available |
|
Code Block |
---|
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4000M
#SBATCH --partition=teton |
| less then 128G | Job Submitted |
Code Block |
---|
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4096M
#SBATCH --partition=teton |
| equals 128G | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available |
|
Note: Yes, this does contradict with the mem
option having to be less than 125GB. This is due to the nuances of the Slurm allocation implementation.
...
The first step in using GPUs is to understand on what Beartooth Hardware partition and type of GPU GPU hardware you want to use.
Updated: 20230206
Combinations:
Example | Partition | Comments |
---|
Code Block |
---|
#SBATCH --gres=gpu:1 |
| blank | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available. |
|
Code Block |
---|
#SBATCH --gres=gpu:2 |
| blank | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available. |
|
Code Block |
---|
#SBATCH --gres=gpu:p100:1 |
| blank | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available. |
|
Code Block |
---|
#SBATCH --gres=gpu:1
#SBATCH --partition=teton-gpu |
| teton-gpu | Allocates a single p100 on the teton-gpu partition. |
Code Block |
---|
#SBATCH --gres=gpu:3
#SBATCH --partition=teton-gpu |
| teton-gpu | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available |
There are no nodes in the teton-gpu partition that have 5 GPUs. There are only two GPU devices available. |
Code Block |
---|
#SBATCH --gres=gpu:2
#SBATCH --partition=beartooth-gpu |
| beartooth-gpu | Allocates two a30 devices on the beartooth-gpu partition. |
Code Block |
---|
#SBATCH --partition=dgx
#SBATCH --gres=gpu:4
#SBATCH --nodelist=tdgx01 |
| dgx | Code Block |
---|
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-b9ac5945-6494-eedd-795b-6eec42ab3e8c)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-3143b8f5-a348-cce9-4ad4-91c01618d7fd)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-5f01803f-6231-4241-41c9-8ca05dadf881)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-f1646ebd-75a2-53a9-b9df-8b7fc51fc26c) |
|
Code Block |
---|
#SBATCH --partition=dgx
#SBATCH --gres=gpu:v100:4
#SBATCH --nodelist=mdgx01 |
| dgx | Code Block |
---|
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-64dc6369-4c36-824d-182c-8e8f9c33f587)
GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-d4adb339-0dba-47db-e766-96b9cbc302b4)
GPU 2: Tesla V100-SXM2-16GB (UUID: GPU-f7637d82-f0c0-15e6-da23-21216b9b8f33)
GPU 3: Tesla V100-SXM2-16GB (UUID: GPU-5afe7813-6b87-b667-e0a3-8e04662357e8) |
If you just want the V100s, and not concerned if it’s the 16G or 32G then you do not need to define nodelist . |
Notes:
To request a specific type of GPU device you need to explicitly define the partition that tthat device can be found on.
T
o check your submission is GPU enabled and the type of GPU, use the $CUDA_VISIBLE_DEVICES
environment variable and nvidia-smi -L
within your batch script:
...