Table of Contents |
---|
minLevel | 1 |
---|
maxLevel | 7 |
---|
type | flat |
---|
separator | pipe |
---|
|
...
The first step in using GPUs is to understand on what Beartooth Hardware partition and type of GPU GPU hardware you want to use.
Updated: 20230206
Combinations:
Example | Partition | Comments |
---|
Code Block |
---|
#SBATCH --gres=gpu:1 |
| blank | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available. |
|
Code Block |
---|
#SBATCH --gres=gpu:2 |
| blank | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available. |
|
Code Block |
---|
#SBATCH --gres=gpu:p100:1 |
| blank | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available. |
|
Code Block |
---|
#SBATCH --gres=gpu:1
#SBATCH --partition=teton-gpu |
| teton-gpu | Allocates a single p100 on the teton-gpu partition. |
Code Block |
---|
#SBATCH --gres=gpu:3
#SBATCH --partition=teton-gpu |
| teton-gpu | Code Block |
---|
sbatch: error: Batch job submission failed: Requested node configuration is not available |
There are no nodes in the teton-gpu partition that have 5 GPUs. There are only two GPU devices available. |
Code Block |
---|
#SBATCH --gres=gpu:2
#SBATCH --partition=beartooth-gpu |
| beartooth-gpu | Allocates two a30 devices on the beartooth-gpu partition. |
Notes:
To request a specific type of GPU device you need to explicitly define the partition that tthat device can be found on.
T
o check your submission is GPU enabled and the type of GPU, use the $CUDA_VISIBLE_DEVICES
environment variable and nvidia-smi -L
within your batch script:
Code Block |
---|
echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES
nvidia-smi -L |
If you request more than one GPU, you'll get something of the form:
...
|
Code Block |
---|
#SBATCH --partition=dgx
#SBATCH --gres=gpu:4
#SBATCH --nodelist=tdgx01 |
| dgx | Code Block |
---|
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-b9ac5945-6494-eedd-795b-6eec42ab3e8c)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-3143b8f5-a348-cce9-4ad4-91c01618d7fd)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU- |
|
...
...
...
...
...
...
...
Tesla V100-SXM2-32GB (UUID: GPU- |
|
...
...
...
...
...
If no value appears for the environment variable, then this means GPU is not enabled.
DGX nodes: These are on a special partition that requires access to be given directly to user accounts or project accounts and likely require additional approval for access. There are two specific nodes:
mdgx01 using V100-SXM2-16GB
tdgx01 using V100-SXM2-32GB
To request these, first set the partition to dgx, and then to explicitly request a specific node use the nodelist
option, without this option you could be allocated either node. The following example will use four GPUs on node tdgx01.
To ask for a specific GPU on a specific node, you’ll need to define --nodelist
and the name of the node:
...
Code Block |
---|
#SBATCH --partition=dgx
#SBATCH --gres=gpu:v100:4
#SBATCH --nodelist=mdgx01 |
| dgx | Code Block |
---|
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-64dc6369-4c36-824d-182c-8e8f9c33f587)
GPU 1: Tesla V100-SXM2- |
|
...
...
...
...
...
...
96b9cbc302b4)
GPU 2: Tesla V100-SXM2- |
|
...
...
...
...
...
...
21216b9b8f33)
GPU 3: Tesla V100-SXM2- |
|
...
...
...
...
...
If you just want the V100s, and not concerned if it’s the 16G or 32G:
...
If you just want the V100s, and not concerned if it’s the 16G or 32G then you do not need to define nodelist . |
Notes:
To request a specific type of GPU device you need to explicitly define the partition that tthat device can be found on.
T
o check your submission is GPU enabled and the type of GPU, use the $CUDA_VISIBLE_DEVICES
environment variable and nvidia-smi -L
within your batch script:
Code Block |
---|
echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES
nvidia-smi -L |
If you request more than one GPU, you'll get something of the form:
Code Block |
---|
CUDA_VISIBLE_DEVICES: 0,1
GPU 0: NVIDIA A30 (UUID: GPU-f7637d82b9614d02-f0c0bcc7-15e6e75c-da234c9c-21216b9b8f33ba3515f8c082)
GPU 31: Tesla V100-SXM2-16GBNVIDIA A30 (UUID: GPU-5afe78134b8746a4-6b874f7f-b66793dc-e0a30cd5-8e04662357e8a8b166100bbd) |
Additionally, bash jobs need to be submitted using sbatch_dgx
If no value appears for the environment variable, then this means GPU is not enabled.
Interactive Jobs:
You can request GPUs via an interactive job:
Code Block |
---|
salloc --account=<account> --time=01:00:00 -N 1 -c 1 --partition=teton-gpu --gres=gpu:1 |
Summary
...