The Teton HPC cluster is the successor of Mount Moran. Teton contains several new compute nodes. All Mount Moran nodes have been reprovisioned within the Teton HPC Cluster. The system is available by SSH using hostname teton.arcc.uwyo.edu or teton.uwyo.edu. We ask that everybody who uses ARCC resources cite the resources accordingly. See Citing Teton. Newcomers to research computing should also consider reading the Research Computing Quick Reference.

...

Contents

Table of Contents

Glossary

...

Tip
HPC Training

Tip
Teton Overview

...

Overview

Teton is an Intel x86_64 cluster connected via a Mellanox FDR/EDR InfiniBand and has a 1.3 PB IBM Spectrum Scale global parallel filesystem available to all nodes. The system requires UWYO two-factor authentication (2FA) for login via SSH. The default shell is BASH with Lmod modules system is leveraged for dynamic user environments to help switch software stacks rapidly and easily. The Slurm workload manager is employed to schedule jobs, provide submission limits, and implement fair share as well as provide the Quality of Service (QoS) levels for research groups who have invested in the cluster.

Teton has a Digital Object Identifier (DOI) (https://doi.org/10.15786/M2FY47) and we request that all use of Teton appropriately acknowledges the system. Please see Citing Teton for more information.

Available Nodes

See Partitions for information regarding Slurm Partitions on Teton.

...

File spaces within the Teton cluster filesystem may be subject to a purge policy. The policy has not yet been defined. However, ARCC reserves the right to purge data in this area after 30 to 90 days of no access or from creation time. Before performing an actual purge event, the owner of the file(s) will be notified by email several times for files that are subject to being purged.

Storage Increases on Teton

Project PIs can purchase additional scratch and/or project space at a cost of $100 / TB / year.
Additionally, PIs can request allocation increases at no cost for scratch and/or project space by submitting proposals that must be renewed when substantial cluster or storage changes occur:
- the scientific gain and insights that will be or have been obtained by using the system,
- how data is organized and accessed in efforts to maximize performance and usage.
- Projects are limited to 1 no-cost increase.
To request more information, please contact ARCC.

...

Teton has login nodes for users to access the cluster. Login nodes are available publicly using the hostname teton.arcc.uwyo.edu or teton.uwyo.edu. SSH can be done natively on MacOS or Linux based operating systems using the terminal and the ssh command. Although X11 forwarding is supported, and if you need graphical support, we recommend using FastX if at all possible. Additionally, you may want to configure your OpenSSH client to support connection multiplexing if you require multiple terminal sessions. For those instances where you have unreliable network connectivity, you may want to use either tmux or screen once you login to keep sessions alive during disconnects. This will allow you to later reconnect to these sessions.

...

Teton has several shells available for use. The default is bash]. To change your default shell, please submit the request through standard ARCC request methods.hard

...

The following tables list each node that has GPUs and the type of GPU installed.

Table #1

Expand

title	Click Here to View Table #1

Node	Partition	GPU Type	Number of Devices	GPU Memory Size (GB)	Compute Capability	GRES Flag	Teton Partition	Notes
m025	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m026	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m027	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m028	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m029	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m030	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m031	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m032	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m075	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m076	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m077	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m078	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m079	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m080	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m081	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m082	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m083	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes	Disabled due to memory ECC errors
m084	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes	Disabled due to memory ECC errors
m085	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes	Disabled due to memory ECC errors
m086	moran	K20m	2	4	3.5	gpu:k20m:{1-2}	Yes
m219	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m220	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m227	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m228	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m235	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m236	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m243	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m244	moran	20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m251	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m252	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m259	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m260	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m267	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
m268	moran	K20Xm	2	5	3.5	gpu:k20xm:{1-2}	Yes
mdbg01	moran	GTX Titan X	1	12	5.2	gpu:TitanX:{1-1}	Yes
	moran	GTX Titan	2	6	6.0	gpu:Titan:{1-2}	Yes
mdbg02	moran	K40c	2	11	3.5	gpu:k40c:{1-2}	Yes
	moran	GTX Titan X	2	12	5.2	gpu:TitanX:{1-2}	Yes
mbm01	moran-gpu	K80	8	11	3.7	gpu:k80:{1-8}	No
mbm02	moran-gpu	K80	8	11	3.7	gpu:k80:{1-8}	No
tbm03	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No
tbm04	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No
tbm05	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No
tbm06	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No
tbm07	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No
tbm08	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No
tbm09	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No
tbm10	teton-gpu	Tesla P100	2	16	6.0	gpu:P100:{1-2}	No

The following two GPU nodes are reserved for AI use.

Table #2

Node	Partition	GPU Type	Number of Devices	GPU Memory Size (GB)	Compute Capability	GRES Flag	Teton Partition	Notes
mdgx01	dgx	Tesla V100	8	16	7.0	gpu:V100-16g:{1-8}	No
tdgx01	dgx	Tesla V100	8	32	7.0	gpu:V100-32g:{1-8}	No

...

For additional information about CUDA programming visit Nivida's CUDA C Programming Guide: [2]

Access and Running Jobs

There are three different types of GPU nodes in the Teton cluster and they are requested in somewhat different ways.

...

Node sharing can be accessed by requesting less than the full number of GPUs, CPUs or memory. Note that node sharing can also be done on the basis of the number of CPU's and/or memory, or all three. By default, each job gets 3.5 GB of memory per core requested (the lowest common denominator among our cluster nodes), therefore to request a different amount than the default amount of memory, you must use the "-mem" flag. To request exclusive use of the node, use "-mem=0".

Example #1

An example script that would request two Teton nodes with 2xK20m GPU's, including all cores and all memory, running one GPU per MPI task, would look like this:

Code Block
#SBATCH --nodes=2 #SBATCH --mem=0 #SBATCH --partition=teton #SBATCH --account=<account> #SBATCH --gres=gpu:k20m:2 #SBATCH --time=1:00:00 ... Other job prep srun myprogram.exe

Example #2

To request all 8 K80 GPUs on a Teton node, again using one GPU per MPI task, we would do:

Code Block
#SBATCH --nodes=1 #SBATCH --mem=0 #SBATCH --partition=teton #SBATCH --account=<account> #SBATCH --gres=gpu:k80:8 #SBATCH --time=1:00:00 ... Other job prep srun myprogram.exe

Example #3

Another example, using the job script below will get four GPUs, four CPU cores, and 8GB of memory. The remaining GPUs, CPUs, and memory will then be accessible to other jobs.

Code Block
#SBATCH --ntasks=4 #SBATCH --nodes=1 #SBATCH --mem=8 #SBATCH --partition=teton #SBATCH --account=<account> #SBATCH --gres=gpu:k80:4 #SBATCH --time=00:30:00 ... Other job prep srun myprogram.exe

Example #4

To run a parallel interactive job with MPI, do not use the usual "srun" command, as this does not work properly with the "gres" request. Instead, use the "salloc" command, e.g.

...

Code Block
srun -n 1 -N 1 -t 1:00:00 -A <account> -p teton-gpu --gres=gpu:p100:1 --pty /bin/bash -l

GPU Programming Environment

On Teton Nvidia CUDA, PGI CUDA Fortran and the OpenACC compilers are installed. The default CUDA is 9.2.88, which at the time of writing is the most recent. You can access by simply loading the CUDA module, "module load cuda". PGI compilers come with their own CUDA which is quite recent, and can be set access by loading the PGI module, using "module load pgi".

Any login node should work to compile your CUDA code as the CUDA tools are avaiable available from the login nodes. PGI compilers come with their own CUDA so compiling anywhere from where you can load the PGI module should work.

...

Code Block
userX@m025:~# nvidia-smi -L GPU 0: Tesla K20m (UUID: GPU-2e23ddef-1d96-7894-102a-0458da3faaa4) GPU 1: Tesla K20m (UUID: GPU-458a86ec-09cd-64d1-475a-d36dc0a73b4f)

Debugging

Nvidia's CUDA distribution includes a terminal debugger named cuda-gdb. Its operation is similar to the GNU gdbdebugger. For details, see the cuda-gdb documentation.

...

The Allinea DDT debugger that we currently license also supports CUDA and OpenACC debugging. Due to its user-friendly graphical interface, we recommend them for GPU debugging. For information on how to use DDT or TotalView, see our debugging page.

Profiling

Profiling can be very useful in finding GPU code performance problems, for example, inefficient GPU utilization, use of shared memory, etc. Nvidia CUDA provides both command line (nprof) and visual profiler (nvvp). More information is in the CUDA profilers' documentation.

...

Versions Compared

Old Version 26

New Version 27

Key

Overview

Available Nodes

Storage Increases on Teton

Access and Running Jobs

GPU Programming Environment

Debugging

Profiling

Page Comparison

Versions Compared

Old Version 26

New Version 27

Key

Overview

Available Nodes

Storage Increases on Teton

Access and Running Jobs

GPU Programming Environment

Debugging

Profiling