Hardware - Teton
Overview
This Wiki section contains information about the hardware used for research and/or run on the high performance computing system at UWyo.
Available Nodes
Type | Scheduler Partition | Series | Arch | Count | Sockets | Cores | Threads / Core | Clock (GHz) | RAM (GB) | GPU Type | GPU Count | Local Disk Type | Local Disk Capacity (GB) | IB Network | Operating System | Status |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Teton Regular | teton | Intel Broadwell | x86_64 | 180 | 2 | 32 | 1 | 2.1 | 128 | N/A | N/A | SSD | 240 | EDR | RHEL 7.9 | No longer available for purchase |
Teton Cascade | teton-cascade | Intel Cascade Lake | x86_64 | 44 | 2 | 40 | 1 | 2.1 | 192/768 | N/A | N/A | SSD | 240 | EDR | RHEL 7.9 | Current Model |
Teton BigMem GPU | teton-gpu | Intel Broadwell | x86_64 | 8 | 2 | 32 | 1 | 2.1 | 512 | NVIDIA P100 16G | 2 | SSD | 240 | EDR | RHEL 7.9 | No longer available for purchase |
Teton HugeMem | teton-hugemem | Intel Broadwell | x86_64 | 10 | 2 | 32 | 1 | 2.1 | 1024 | N/A | N/A | SSD | 240 | EDR | RHEL 7.9 | No longer available for purchase |
Teton Massive Memory | teton-massmem | AMD/EPYC | x86_64 | 2 | 2 | 48 | 1 |
| 4096 | N/A | N/A | SSD | 4096 | EDR | RHEL 7.9 | Current Model |
Teton KNL | teton-knl | Intel Knights Landing | x86_64 | 12 | 1 | 72 | 4 | 1.5 | 384 + 16 | N/A | N/A | SSD | 240 | EDR | RHEL 7.9 | No longer available for purchase |
Teton DGX | dgx | Intel Broadwell | x86_64 | 1 | 2 | 40 | 2 | 2.2 | 512 | NVIDIA V100 32G | 8 | SSD | 7 TB | EDR | Ubuntu 18.04.2 LTS | Available as special order |
Teton Test | arcc | Intel Broadwell | x86_64 | 8 | 2 | 14 | 1 | 2.4 | 128 | N/A | N/A | HD | 240G | EDR | RHEL 7.9 | No longer available for purchase |
Moran Regular | moran | Intel Sandbridge/Ivybridge | x86_64 | 280 | 2 | 16 | 1 | 2.6 | 64 or 128 | k20 on some | 2 | HD | 1T | FDR | RHEL 7.9 | No longer available for purchase |
Moran BigMem | moran-bigmem-gpu | Intel Haswell | x86_64 | 2 | 2 | 16 | 1 | 2.6 | 512 | K80 | 8 | HD | 1T | FDR | RHEL 7.9 | No longer available for purchase |
Moran Debug | moran | Intel Ivybridge | x86_64 | 2 | 2 | 16 | 1 | 2.6 | 64 | k20m | 2 | HD | 1T | FDR | RHEL 7.9 | No longer available for purchase |
Moran HugeMem | moran-hugemem | Intel Haswell | x86_64 | 2 | 2 | 16 | 1 | 2.6 | 1024 | k20 | 2 | HD | 1T | FDR | RHEL 7.9 | No longer available for purchase |
Moran DGX | dgx | Intel Broadwell | x86_64 | 1 | 2 | 40 | 2 | 2.2 | 512 | NVIDIA V100 16G | 8 | SSD | 7 TB | EDR | Ubuntu 18.04.2 LTS | Available as special order |
Moran Test | arcc | Intel Haswell | x86_64 | 1 | 2 | 20 | 1 | 2.6 | 64 | N/A | N/A | HD | 300G | FDR | RHEL7.9 | No longer available for purchase |
TOTAL Nodes |
|
|
| 553 |
|
|
|
|
|
|
|
|
|
|
|
|
GPUs and Accelerators
The ARCC Teton cluster has a number of compute nodes that contain GPUs. This section describes the hardware, as well as access and usage of the GPU nodes.
Teton GPU Hardware
The following tables list each node that has GPUs and the type of GPU installed.
Partition | GPU | Devices | Nodes | CUDA Cores | GPU Memory Size (GB) | Compute Capability |
---|---|---|---|---|---|---|
moran | GeForce GTX Titan | [1-2] | mdbg01 | 2688 | 6 | 3.5 |
moran | GeForce GTX Titan X | 0 [2-3] | mdbg01 mdbg02 | 3072 | 12 | 5.2 |
moran | Tesla K20m | [0-1] 1 | m[025-32], m[075-82], m086 m268 | 2496 | 4.7 | 3.5 |
moran | Tesla K20Xm | [0-1] 0 | m219/20/27/28, m235.36, m243/4, m251/2/9, m260/7 m268 | 2688 | 5.7 | 3.5 |
moran | Tesla K40c | [0-1] | mdbg02 | 2880 | 11.4 | 3.5 |
moran-bigmem-gpu | Tesla K80 | [0-7] | mbm[01-02] | 2496 | 11.4 | 3.7 |
teton-gpu | Tesla P100 | [0-1] | tbm[03-10] | 3584 | 16 | 6.0 |
Notes:
Review Nvidia’s Compute Capabilities to understand what each version provides.
The CUDA FAQ defines that “the compute capability of a GPU determines its general specifications and available features.”
For example, none of the above GPUs have tensor cores - you need compute capabilities 7.0 and higher.
The above table will update as old nodes are decommissioned and new nodes are bought into the cluster.
The following two GPU nodes are reserved for AI use. These are special nodes running Ubuntu and CUDA 11.0.
Partition | GPU | Devices | Nodes | CUDA Cores | Tensor Cores | GPU Memory Size (GB) | Compute Capability |
---|---|---|---|---|---|---|---|
dgx | Tesla V100 | [0-7] | mdgx01 | 5120 | 640 | 16 | 7.0 |
dgx | Tesla V100 | [0-7] | tdgx01 | 5120 | 640 | 32 | 7.0 |
On how to request GPUs via a bash script submitted via sbatch
, or via an interactive session using salloc
, and how to check resources requested, please see: Introduction to Job Submission: 02: Memory and GPUs
GPU Programming Environment
CUDA
On Teton Nvidia CUDA, PGI CUDA Fortran and the OpenACC compilers are installed. use module spider cuda
to see the versions of CUDA available, then module load
the version you require..
Any login node should work to compile your CUDA code as the CUDA tools are available from the login nodes.
To compile CUDA code using the CUDA compiler "nvcc" so that it runs on all types of GPUs that ARCC has, use the following compiler flags:
-gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70
For more info on the CUDA compilation and linking flags, please view the CUDA C++ Programming Guide.
OpenACC
To invoke OpenACC, use the "-acc" flag. More information on OpenACC can be obtained on the OpenACC website.
PGI Compilers - Depreciated
As of October 2020, the PGI compilers are not on the Teton cluster, so the text below is no longer relevant. If you require the PGI compilers hen please contact arcc.
PGI compilers come with their own CUDA which is quite recent, and can be set access by loading the PGI module, using "module load pgi".
The PGI compilers specify the GPU architecture with the -tp=tesla flag. If no further option is specified, the flag will generate code for all available computing capabilities (at the time of writing cc35,cc37, cc50, cc60, and cc70). To be specific for each GPU:
GPU Type | Compiler Flag |
---|---|
K20m |
|
K20Xm |
|
Titan |
|
Titan X |
|
K40c |
|
K80 |
|
P100 |
|
V100 |
|