New MedicineBow Hardware
Slurm Partition name | Requestable features | Node | Socket/ | Cores/ | Threads/ | Total Cores/ | RAM | Processor (x86_64) | Local Disks | OS | Use Case | Key Attributes |
---|---|---|---|---|---|---|---|---|---|---|---|---|
medicinebow | amd, epyc | 25 | 2 | 48 | 1 | 96 | 1024 | 2x 48-Core/96-Thread 4th Gen AMD EPYC 9454 | 4TB SSD | RHEL 9.3 | For compute jobs running the latest and greatest MedicineBow hardware | MB Compute with 1TB RAM |
medicinebow-a30 | amd, epyc | 8 | 768 | DL Inference, AI, Mainstream Acceleration | MB Compute with 24GB RAM/GPU & A30 GPU | |||||||
medicinebow-l40s | amd, epyc | 5 | 768 | DL Inference, Omniverse/Rendering, Mainstream Acceleration | MB Compute with 48GB RAM/GPU & L40S GPU | |||||||
medicinebow-h100 | amd, epyc | 6 | 1228 | DL Training and Inference, DA, AI, Mainstream Acceleration | MB Compute with 80GB RAM/GPU & Nvidia SXM5 H100 GPU |
Former Beartooth Hardware (to be consolidated into MedicineBow - pending)
Slurm Partition name | Requestable features | Node | Sockets/ | Cores/ | Threads/ | Total | RAM | Processor (x86_64) | Local Disks | OS | Use Case | Key Attributes |
---|---|---|---|---|---|---|---|---|---|---|---|---|
moran | fdr, intel, sandy, ivy, community | 273 | 2 | 8 | 1 | 16 | 64 or 128 | Intel Ivybridge/ | 1 TB HD | RHEL 8.8 | For compute jobs not needing the latest and greatest hardware. | Original Moran compute |
moran-bigmem | fdr, intel, haswell | 2 | 2 | 8 | 1 | 16 | 512 | Intel Haswell | 1 TB HD | RHEL 8.8 | For jobs not needing the latest hardware, w/ above average memory requirements. | Moran compute w/ 512G of RAM |
moran-hugemem | fdr, intel, haswell, community | 2 | 2 | 8 | 1 | 16 | 1024 | Intel Haswell | 1 TB HD | RHEL 8.8 | For jobs that don’t need the latest hardware, w/ escalated memory requirements. | Moran compute w/ 1TB of RAM |
dgx | edr, intel, broadwell | 2 | 2 | 20 | 2 | 40 | 512 | Intel Broadwell | 7 TB SSD | RHEL 8.8 | For GPU and AI-enabled workloads. | Special DGX GPU compute nodes |
teton | edr, intel, broadwell, community | 175 | 2 | 16 | 1 | 32 | 128 | Intel Broadwell | 240 GB SSD | RHEL 8.8 | For regular compute jobs. | Teton compute |
teton-cascade | edr, intel, cascade, community | 56 | 2 | 20 | 1 | 40 | 192 or 768 | Intel Cascade Lake | 240 GB SSD | RHEL 8.8 | For compute jobs w/ on newer-older hardware, and somewhat higher memory requirements. | Teton compute w/ Cascade Lake CPUs |
teton-gpu | edr, intel, broadwell, community | 6 | 2 | 16 | 1 | 32 | 512 | Intel Broadwell | 240 GB SSD | RHEL 8.8 | For compute jobs utilizing GPUs on prior cluster hardware. | Teton GPU compute |
teton-hugemem | edr, intel, broadwell | 8 | 2 | 16 | 1 | 32 | 1024 | Intel Broadwell | 240 GB SSD | RHEL 8.8 | For compute jobs w/ large memory requirements, running on fast prior cluster hardware. | Teton compute w/ 1TB of RAM |
teton-massmem | edr, amd, epyc | 2 | 2 | 24 | 1 | 48 | 4096 | AMD/EPYC | 4096 GB SSD | RHEL 8.6 | For compute jobs w/ exceedingly demanding memory requirements | Teton compute w/ 4TB of RAM |
teton-knl | edr, intel, knl | 12 | 1 | 72 | 4 | 72 | 384 | Intel Knights Landing | 240 GB SSD | RHEL 8.8 | For jobs using many cores on a single node, but speed isn’t critical | Teton compute w/ Intel Knight’s Landing CPU’s |
beartooth | edr, intel, icelake | 2 | 2 | 28 | 1 | 56 | 256 | Intel Icelake | 436 GB SSD | RHEL 8.8 | For general compute jobs running with newer hardware | Beartooth compute |
beartooth-gpu | edr, intel, icelake | 4 | 2 | 28 | 1 | 56 | 250 or 1024 | Intel Icelake | 436 GB SSD | RHEL 8.8 | For compute jobs needing GPU. | Beartooth GPU compute |
beartooth-bigmem | edr, intel, icelake | 6 | 2 | 28 | 1 | 56 | 515 | Intel Icelake | 436 GB SSD | RHEL 8.8 | For jobs w/ above average memory requirements, on newer hardware. | Beartooth compute w/ 512G of RAM |
beartooth-hugemem | edr, intel, icelake | 8 | 2 | 28 | 1 | 56 | 1024 | Intel Icelake | 436 GB SSD | RHEL 8.8 | For jobs w/ large memory requirements newer hardware. | Beartooth compute w/ 1TB of RAM |
Feature | Description of Feature |
---|---|
fdr | Requests nodes that are connected with an Infiniband cable with a signaling rate of 14.0625 Gbit/s |
edr | Requests nodes that are connected with an Infiniband cable with a signaling rate of 25.78125 Gbit/s |
intel | Requests a processor that is based on an Intel processor. Includes all Intel CPU versions in Beartooth. |
ivy | Requests an Intel Ivy Bridge CPU. |
sandy | Requests an Intel Sandy Bridge CPU. |
broadwell | Requests an Intel Broadwell CPU. |
haswell | Requests an Intel Haswell CPU. |
knl | Requests an Intel Knights Landing CPU. This is a specialized chip and not good for all work loads. |
icelake | Requests an Intel Icelake CPU. |
amd | Requests a processor that is based on an AMD processor. Include all AMD CPU versions in Beartooth. |
epyc | Requests an AMD EPYC CPU. |
community | This feature indicates a node shared equally among the research community. Jobs on these nodes can’t be pre-empted, but can be queued up for far longer. |
GPUs and Accelerators
The ARCC Beartooth cluster has a number of compute nodes that contain GPUs. The following tables list each node that has GPUs and the type of GPU installed.
GPU Type | Partition | Example slurm value to request | # of Nodes | GPU devices per node | CUDA Cores | Tensor Cores | GPU Memory Size (GB) | Compute Capability |
---|---|---|---|---|---|---|---|---|
Tesla P100 |
(all available on | #SBATCH --partition=teton-gpu #SBATCH --gres=gpu:? | 8 | 2 | 3584 | 0 | 16 | 6.0 |
V100 |
(both available on | #SBATCH --partition=dgx #SBATCH --gres=gpu:? | 2 | 8 | 5120 | 640 | 16/32 | 7.0 |
A30 |
| #SBATCH --partition=beartooth-gpu #SBATCH --gres=gpu:? | 15 | 7 on BT/non-investor, 8 on MedicineBow | 3584 | 224 | 25 | 8.0 |
T4 |
| 2 | 3 | 2560 or | 320 or 224 TC/GPU on MB | 16G | 7.5 | |
L40S |
| 5 | 8 | 568 TC/GPU on MB | 48GB/GPU | |||
H100 |
| 6 | 8 | 16896 FP32 CUDA/GPU | 528 TC/GPU on MB | 80GB/GPU |
Specialty Partitions
In some cases you will need to specifically define the partition
to request various compute nodes. Simply requesting associated resources will not be enough. For example:
Teton Massmem Nodes:
#SBATCH --mem=4000G # Fails with: sbatch: error: Memory specification can not be satisfied #SBATCH --mem=4000G #SBATCH --partition=teton-massmem # Job is allocated.
Teton KNL nodes:
#SBATCH --cpus-per-task=70 # Fails with: sbatch: error: CPU count per node can not be satisfied #SBATCH --cpus-per-task=70 #SBATCH --partition=teton-knl # Job is allocated