MedicineBow Hardware Summary Table

MedicineBow Hardware Summary Table

MedicineBow Hardware

The MedicineBow cluster was developed and released to the UW campus community in the Summer of 2024 and is operated in a condo-model (further detailed in this section below). It currently hosts UW ARCC nodes and UW Researcher Investment nodes. While any MedicineBow users can access any node on the cluster in a preemptable fashion, 15 non-investor CPU nodes are available in our non-investor partition to any MedicineBow users to utilize without being subject to preemption. If users need to request specific features you can do so in your SBATCH directives using --constraint=<reqfeature>. See an example below under Request Specific Features.

New MedicineBow Hardware

Slurm Partition name

Requestable features

Node
count

Socket/
Node

Cores/
Socket

Threads/
Core

Total Cores/
Node

RAM
(GB)

Processor (x86_64)

Local Disks

OS

Use Case

Key Attributes

Slurm Partition name

Requestable features

Node
count

Socket/
Node

Cores/
Socket

Threads/
Core

Total Cores/
Node

RAM
(GB)

Processor (x86_64)

Local Disks

OS

Use Case

Key Attributes

mb

amd, epyc

25

2

 

48

 

1

 

96

 

1024

2x 48-Core/96-Thread 4th Gen AMD EPYC 9454 

4TB SSD

 

RHEL 9.3

 

For compute jobs running the latest and greatest MedicineBow hardware

MB Compute with 1TB RAM

mb-a30

amd, epyc

8

768

 

DL Inference, AI, Mainstream Acceleration

MB Compute with 24GB RAM/GPU & A30 GPU

mb-l40s

amd, epyc

5

DL Inference, Omniverse/Rendering, Mainstream Acceleration

MB Compute with 48GB RAM/GPU & L40S GPU

mb-h100

amd, epyc

6

1228

DL Training and Inference, DA, AI, Mainstream Acceleration

MB Compute with 80GB RAM/GPU & Nvidia SXM5 H100 GPU

beartooth

 

edr, intel, icelake

2

2

 

28

1

 

56

256

Intel Icelake

 

436 GB SSD

RHEL 8.8

For general compute jobs running with newer hardware

Beartooth compute

6

515

436 GB SSD

RHEL 8.8

For jobs w/ above average memory requirements, on newer hardware.

Beartooth compute w/ 512G of RAM

8

1024

436 GB SSD

RHEL 8.8

For jobs w/ large memory requirements newer hardware.

Beartooth compute w/ 1TB of RAM

beartooth-gpu

edr, intel, icelake

4

2

28

1

56

250 or 1024

436 GB SSD

RHEL 8.8

For compute jobs needing GPU.

Beartooth GPU compute

teton

 

 

edr, intel, broadwell, community

175

2

 

16

1

32

128

Intel Broadwell

240 GB SSD

RHEL 8.8

For regular compute jobs.

Teton compute

edr, intel, cascade, community

56

20

40

192 or 768

Intel Cascade Lake

240 GB SSD

RHEL 8.8

For compute jobs w/ on newer-older hardware, and somewhat higher memory requirements.

Teton compute w/ Cascade Lake CPUs

edr, intel, broadwell

8

16

32

1024

Intel Broadwell

240 GB SSD

RHEL 8.8

For compute jobs w/ large memory requirements, running on fast prior cluster hardware.

Teton compute w/ 1TB of RAM

edr, amd, epyc

2

24

48

4096

AMD/EPYC

4096 GB SSD

RHEL 8.6

For compute jobs w/ exceedingly demanding memory requirements

Teton compute w/ 4TB of RAM

edr, intel, broadwell

2

20

2

40

512

Intel Broadwell

7 TB SSD

RHEL 8.8

For GPU and AI-enabled workloads.

Special DGX GPU compute nodes

teton-gpu

edr, intel, broadwell, community

6

2

16

1

32

512

Intel Broadwell

 

240 GB SSD

RHEL 8.8

For compute jobs utilizing GPUs on prior cluster hardware.

Teton GPU compute

Hardware Feature Descriptors

Feature

Description of Feature

Feature

Description of Feature

edr

Requests nodes that are connected with an Infiniband cable with a signaling rate of 25.78125 Gbit/s

intel

Requests a processor that is based on an Intel processor. Includes all Intel CPU versions in Beartooth.

broadwell

Requests an Intel Broadwell CPU.

cascade

Requests an Intel Cascade Lake CPU.

icelake

Requests an Intel Icelake CPU.

amd

Requests a processor that is based on an AMD processor. Include all AMD CPU versions in Beartooth and Medicinebow.

epyc

Requests an AMD EPYC CPU.

non-investor/community

This feature indicates a node shared equally among the research community. Jobs on these nodes can’t be pre-empted, but can be queued up for far longer.

GPUs and Accelerators

The ARCC Beartooth cluster has a number of compute nodes that contain GPUs. The following tables list each node that has GPUs and the type of GPU installed.

GPU Type

Partition
(Some partitions may be in the process of migration to MB. Run sinfo for current partitions)

Example slurm value to request

# of Nodes

GPU devices per node

CUDA Cores

Tensor Cores

GPU Memory Size (GB)

Compute Capability

GPU Type

Partition
(Some partitions may be in the process of migration to MB. Run sinfo for current partitions)

Example slurm value to request

# of Nodes

GPU devices per node

CUDA Cores

Tensor Cores

GPU Memory Size (GB)

Compute Capability

A30

beartooth-gpu (4)

mb-a30 (8)

non-investor (3)

#SBATCH --partition=beartooth-gpu #SBATCH --gres=gpu:<#_gpu_requested>

15

7 on BT/non-investor, 8 on MedicineBow

3584

224

25

8.0

L40S

mb-l40s (5)

#SBATCH --partition=beartooth-gpu #SBATCH --gres=gpu:<#_gpu_requested>

5

8

 

568 TC/GPU on MB

48GB/GPU

 

H100

mb-h100 (6)

#SBATCH --partition=beartooth-gpu #SBATCH --gres=gpu:<#_gpu_requested>

6

8

16896 FP32 CUDA/GPU 

528 TC/GPU on MB

80GB/GPU

 

Requesting Specific Features

Since several partitions have been consolidated, users needing specific features may request them by adding a --constraint flag to their SLURM directive as shown in the example below:

Requesting a Teton node with a Cascade Lake processor

#SBATCH --partition=teton ## SLURM Directive requesting node(s) from the teton partition #SBATCH --constraint=cascade ## SLURM Directive restricts the requested job to nodes that were previously segmented to the teton-cascade partition on the Beartooth Cluster

Requesting a teton node with a large amount of RAM/Memory

#SBATCH --partition=teton ## SLURM Directive requesting node(s) from the teton partition #SBATCH --mem=4000G ## SLURM Directive restricts the requested job to nodes with 4TB RAM, previously segmented to teton-massmem partition Beartooth Cluster

Specialty Hardware

ARCC also offers some specialty hardware outside of Medicinebow for unique workloads. Currently this is still in the development and testing phase.

GPU Type

Node
count

Socket/
Node

Cores/
Socket

Threads/
Core

Total Cores/
Node

RAM
(GB)

Processor

Local Disks

Use Case

Notes

GPU Type

Node
count

Socket/
Node

Cores/
Socket

Threads/
Core

Total Cores/
Node

RAM
(GB)

Processor

Local Disks

Use Case

Notes

GH200

2

1

 

72

 

1

 

72

 

480
(+96 HBM3 Shared w/ GPU)

NVIDIA Grace™
72 Arm® Neoverse V2 cores (aarch64)

1TB SSD

Specialty nodes specifically designed for LLM inference, vector database search, and large data processing.

Not generally available to the public.
Please contact us if you have a specialty workload.