MedicineBow Hardware Summary Table

MedicineBow Hardware

The MedicineBow cluster was developed and released to the UW campus community in the Summer of 2024 and is operated in a condo-model (further detailed in this section below). It currently hosts UW ARCC nodes and UW Researcher Investment nodes. While any MedicineBow users can access any node on the cluster in a preemptable fashion, 15 non-investor CPU nodes are available in our non-investor partition to any MedicineBow users to utilize without being subject to preemption.

New MedicineBow Hardware

Slurm Partition name	Requestable features	Node count	Socket/ Node	Cores/ Socket	Threads/ Core	Total Cores/ Node	RAM (GB)	Processor (x86_64)	Local Disks	OS	Use Case	Key Attributes

Slurm Partition name	Requestable features	Node count	Socket/ Node	Cores/ Socket	Threads/ Core	Total Cores/ Node	RAM (GB)	Processor (x86_64)	Local Disks	OS	Use Case	Key Attributes
mb	amd, epyc	25	2	48	1	96	1024	2x 48-Core/96-Thread 4th Gen AMD EPYC 9454	4TB SSD	RHEL 9.3	For compute jobs running the latest and greatest MedicineBow hardware	MB Compute with 1TB RAM
mb-a30	amd, epyc	8					768				DL Inference, AI, Mainstream Acceleration	MB Compute with 24GB RAM/GPU & A30 GPU
mb-l40s	amd, epyc	5					768				DL Inference, Omniverse/Rendering, Mainstream Acceleration	MB Compute with 48GB RAM/GPU & L40S GPU
mb-h100	amd, epyc	6					1228				DL Training and Inference, DA, AI, Mainstream Acceleration	MB Compute with 80GB RAM/GPU & Nvidia SXM5 H100 GPU
beartooth	edr, intel, icelake	2	2	28	1	56	256	Intel Icelake	436 GB SSD	RHEL 8.8	For general compute jobs running with newer hardware	Beartooth compute
beartooth-gpu	edr, intel, icelake	4	2	28	1	56	250 or 1024		436 GB SSD	RHEL 8.8	For compute jobs needing GPU.	Beartooth GPU compute
beartooth-bigmem	edr, intel, icelake	6	2	28	1	56	515		436 GB SSD	RHEL 8.8	For jobs w/ above average memory requirements, on newer hardware.	Beartooth compute w/ 512G of RAM
beartooth-hugemem	edr, intel, icelake	8	2	28	1	56	1024		436 GB SSD	RHEL 8.8	For jobs w/ large memory requirements newer hardware.	Beartooth compute w/ 1TB of RAM
teton	edr, intel, broadwell, community	175	2	16	1	32	128	Intel Broadwell	240 GB SSD	RHEL 8.8	For regular compute jobs.	Teton compute
teton-cascade	edr, intel, cascade, community	56	2	20	1	40	192 or 768	Intel Cascade Lake	240 GB SSD	RHEL 8.8	For compute jobs w/ on newer-older hardware, and somewhat higher memory requirements.	Teton compute w/ Cascade Lake CPUs
teton-gpu	edr, intel, broadwell, community	6	2	16	1	32	512	Intel Broadwell	240 GB SSD	RHEL 8.8	For compute jobs utilizing GPUs on prior cluster hardware.	Teton GPU compute
teton-hugemem	edr, intel, broadwell	8	2	16	1	32	1024	Intel Broadwell	240 GB SSD	RHEL 8.8	For compute jobs w/ large memory requirements, running on fast prior cluster hardware.	Teton compute w/ 1TB of RAM
teton-massmem	edr, amd, epyc	2	2	24	1	48	4096	AMD/EPYC	4096 GB SSD	RHEL 8.6	For compute jobs w/ exceedingly demanding memory requirements	Teton compute w/ 4TB of RAM
teton-knl	edr, intel, knl	12	1	72	4	72	384	Intel Knights Landing	240 GB SSD	RHEL 8.8	For jobs using many cores on a single node, but speed isn’t critical	Teton compute w/ Intel Knight’s Landing CPU’s
dgx	edr, intel, broadwell	2	2	20	2	40	512	Intel Broadwell	7 TB SSD	RHEL 8.8	For GPU and AI-enabled workloads.	Special DGX GPU compute nodes

Hardware Feature Descriptors

Feature	Description of Feature

Feature	Description of Feature
fdr	Requests nodes that are connected with an Infiniband cable with a signaling rate of 14.0625 Gbit/s
edr	Requests nodes that are connected with an Infiniband cable with a signaling rate of 25.78125 Gbit/s
intel	Requests a processor that is based on an Intel processor. Includes all Intel CPU versions in Beartooth.
broadwell	Requests an Intel Broadwell CPU.
knl	Requests an Intel Knights Landing CPU. This is a specialized chip and not good for all work loads.
icelake	Requests an Intel Icelake CPU.
amd	Requests a processor that is based on an AMD processor. Include all AMD CPU versions in Beartooth.
epyc	Requests an AMD EPYC CPU.
non-investor	This feature indicates a node shared equally among the research community. Jobs on these nodes can’t be pre-empted, but can be queued up for far longer.

GPUs and Accelerators

The ARCC Beartooth cluster has a number of compute nodes that contain GPUs. The following tables list each node that has GPUs and the type of GPU installed.

GPU Type	Partition (Some partitions may be in the process of migration to MB. Run sinfo for current partitions)	Example slurm value to request	# of Nodes	GPU devices per node	CUDA Cores	Tensor Cores	GPU Memory Size (GB)	Compute Capability

GPU Type

Partition
(Some partitions may be in the process of migration to MB. Run sinfo for current partitions)

Example slurm value to request

# of Nodes

GPU devices per node

CUDA Cores

Tensor Cores

GPU Memory Size (GB)

Compute Capability

A30

beartooth-gpu (4)

mb-a30 (8)

non-investor (3)

#SBATCH --partition=beartooth-gpu
#SBATCH --gres=gpu:<#_gpu_requested>

15

7 on BT/non-investor, 8 on MedicineBow

3584

224

25

8.0

L40S

mb-l40s (5)

#SBATCH --partition=beartooth-gpu
#SBATCH --gres=gpu:<#_gpu_requested>

5

8

568 TC/GPU on MB

48GB/GPU

H100

mb-h100 (6)

#SBATCH --partition=beartooth-gpu
#SBATCH --gres=gpu:<#_gpu_requested>

6

8

16896 FP32 CUDA/GPU

528 TC/GPU on MB

80GB/GPU

Specialty Partitions

In some cases you will need to specifically define the partition to request various compute nodes. Simply requesting associated resources will not be enough. For example:

Teton Massmem Nodes:

#SBATCH --mem=4000G
# Fails with: sbatch: error: Memory specification can not be satisfied

#SBATCH --mem=4000G
#SBATCH --partition=teton-massmem
# Job is allocated.

Teton KNL nodes:

#SBATCH --cpus-per-task=70
# Fails with: sbatch: error: CPU count per node can not be satisfied

#SBATCH --cpus-per-task=70
#SBATCH --partition=teton-knl
# Job is allocated

Specialty Hardware

ARCC also offers some specialty hardware outside of Medicinebow for unique workloads. Currently this is still in the development and testing phase.

GPU Type	Node count	Socket/ Node	Cores/ Socket	Threads/ Core	Total Cores/ Node	RAM (GB)	Processor	Local Disks	Use Case	Notes

GPU Type	Node count	Socket/ Node	Cores/ Socket	Threads/ Core	Total Cores/ Node	RAM (GB)	Processor	Local Disks	Use Case	Notes
GH200	2	1	72	1	72	480 (+96 HBM3 Shared w/ GPU)	NVIDIA Grace™ 72 Arm® Neoverse V2 cores (aarch64)	1TB SSD	Specialty nodes specifically designed for LLM inference, vector database search, and large data processing.	Not generally available to the public. Please contact us if you have a specialty workload.