...
Teton has a Digital Object Identifier (DOI) (https://doi.org/10.15786/M2FY47) and we request that all use of Teton appropriately acknowledges the system. Please see Citing Teton for more information.
Available Nodes
See Partitions for information regarding Slurm Partitions on Teton.
...
The petaLibrary filesystems are only available from the login nodes, not on the compute nodes. A storage Storage space on the Teton global filesystems does not imply storage space on the ARCC petaLibrary or vice versa. For more information about the petaLibrary please see the following link petaLibrary
...
To request access for instructional use, send an email to arcc-info@uwyo.edu with the course number, section, and student list. If the PI prefers generic accounts can be created instead of providing a student list. Instructional accounts are usually valid for a single semester and access to the project is terminated at the beginning of the next semester.
System Access
...
SSH Access
Teton has login nodes for users to access the cluster. Login nodes are available publicly using the hostname teton.arcc.uwyo.edu or teton.uwyo.edu. SSH can be done natively on MacOS or Linux based operating systems using the terminal and the ssh command. Although X11 forwarding is supported, and if you need graphical support, we recommend using FastX if at all possible. Additionally, you may want to configure your OpenSSH client to support connection multiplexing if you require multiple terminal sessions. For those instances where you have unreliable network connectivity, you may want to use either tmux or screen once you login to keep sessions alive during disconnects. This will allow you to later reconnect to these sessions.
Code Block | ||
---|---|---|
| ||
ssh USERNAME@teton.arcc.uwyo.edu ssh -l USERNAME teton.arcc.uwyo.edu ssh -Y -l USERNAME teton.arcc.uwyo.edu # For secure forwarding of X11 displays ssh -X -l USERNAME teton.arcc.uwyo.edu # For forwarding of X11 displays |
OpenSSH Configuration File (BSD,Linux,macOS)
By default, the OpenSSH user configuration file is $HOME/.ssh/config which can be edited to enhance workflow. Since Teton uses round-robin DNS to provide access to two login nodes and requires two-factor authentication, it can be advantageous to add SSH multiplexing to your local environment to make sure subsequent connections are made to the same login node. This also provides a way to shorten up the hostname and access methods for SCP/SFTP/Rsync capabilities. An example entry looks like where USERNAME would be replaced by your actual UWYO username:
...
Note |
---|
WARNING: While ARCC allows SSH multiplexing, other research computing sites may not. Do not assume this will always work on systems not administered by ARCC. |
Access from Microsoft Windows
ARCC currently recommends that users install MobaXterm to access the Teton cluster. It provides appropriate access to the system with SSH and SFTP capability, allowing X11 if required. The home version of MobaXterm should be sufficient. There is also PuTTY if a more minimal application is desired.
Additional options include, a Cygwin installation with SSH installed or the Windows Subsystem for Linux with an OpenSSH client installed on very recent versions of windows, enabling the OpenSSH client. Finally, a great alternative is to use our FastX capability.
FastX Access
If you’re currently on the UW campus, you can also leverage FastX to provide you with a more robust remote graphics capability via an installable client for Windows, Mac, or Linux or through a web browser. Navigate to https://fastx.arcc.uwyo.edu and log in with your 2FA credentials. There are also native clients for FastX for Windows, macOS, and Linux which can be downloaded here. For more information, see the documentation on using FastX.
Available Shells
Teton has several shells available for use. The default is bash]. To change your default shell, please submit the request through standard ARCC request methods.hard
...
The ARCC Teton cluster has a number of compute nodes that contain GPUs. This section describes the hardware, as well as access and usage of the GPU nodes.
Teton GPU Hardware
The following tables list each node that has GPUs and the type of GPU installed.
...
Expand | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
...
For additional information about CUDA programming visit Nivida's CUDA C Programming Guide: [2]
Access and
...
Running Jobs
There are three different types of GPU nodes in the Teton cluster and they are requested in somewhat different ways.
...
Node sharing can be accessed by requesting less than the full number of GPUs, CPUs or memory. Note that node sharing can also be done on the basis of the number of CPU's and/or memory, or all three. By default, each job gets 3.5 GB of memory per core requested (the lowest common denominator among our cluster nodes), therefore to request a different amount than the default amount of memory, you must use the "-mem" flag. To request exclusive use of the node, use "-mem=0".
Example #1
An example script that would request two Teton nodes with 2xK20m GPU's, including all cores and all memory, running one GPU per MPI task, would look like this:
Code Block |
---|
#SBATCH --nodes=2 #SBATCH --mem=0 #SBATCH --partition=teton #SBATCH --account=<account> #SBATCH --gres=gpu:k20m:2 #SBATCH --time=1:00:00 ... Other job prep srun myprogram.exe |
Example #2
To request all 8 K80 GPUs on a Teton node, again using one GPU per MPI task, we would do:
Code Block |
---|
#SBATCH --nodes=1 #SBATCH --mem=0 #SBATCH --partition=teton #SBATCH --account=<account> #SBATCH --gres=gpu:k80:8 #SBATCH --time=1:00:00 ... Other job prep srun myprogram.exe |
Example #3
Another example, using the job script below will get four GPUs, four CPU cores, and 8GB of memory. The remaining GPUs, CPUs, and memory will then be accessible to other jobs.
Code Block |
---|
#SBATCH --ntasks=4 #SBATCH --nodes=1 #SBATCH --mem=8 #SBATCH --partition=teton #SBATCH --account=<account> #SBATCH --gres=gpu:k80:4 #SBATCH --time=00:30:00 ... Other job prep srun myprogram.exe |
Example #4
To run a parallel interactive job with MPI, do not use the usual "srun" command, as this does not work properly with the "gres" request. Instead, use the "salloc" command, e.g.
...
Code Block |
---|
srun -n 1 -N 1 -t 1:00:00 -A <account> -p teton-gpu --gres=gpu:p100:1 --pty /bin/bash -l |
GPU
...
Programming Environment
On Teton Nvidia CUDA, PGI CUDA Fortran and the OpenACC compilers are installed. The default CUDA is 9.2.88, which at the time of writing is the most recent. You can access by simply loading the CUDA module, "module load cuda". PGI compilers come with their own CUDA which is quite recent, and can be set access by loading the PGI module, using "module load pgi".
...
The PGI compilers specify the GPU architecture with the -tp=tesla flag. If no further option is specified, the flag will generate code for all available computing capabilities (at the time of writing cc35,cc37, cc50, cc60, and cc70). To be specific for each GPU:
GPU Type | Compiler Flag |
---|---|
K20m |
|
K20Xm |
|
Titan |
|
Titan X |
|
K40c |
|
K80 |
|
P100 |
|
V100 |
|
...
Code Block |
---|
userX@m025:~# nvidia-smi -L GPU 0: Tesla K20m (UUID: GPU-2e23ddef-1d96-7894-102a-0458da3faaa4) GPU 1: Tesla K20m (UUID: GPU-458a86ec-09cd-64d1-475a-d36dc0a73b4f) |
Debugging
Nvidia's CUDA distribution includes a terminal debugger named cuda-gdb. Its operation is similar to the GNU gdbdebugger. For details, see the cuda-gdb documentation.
...
The Allinea DDT debugger that we currently license also supports CUDA and OpenACC debugging. Due to its user-friendly graphical interface, we recommend them for GPU debugging. For information on how to use DDT or TotalView, see our debugging page.
Profiling
Profiling can be very useful in finding GPU code performance problems, for example, inefficient GPU utilization, use of shared memory, etc. Nvidia CUDA provides both command line (nprof) and visual profiler (nvvp). More information is in the CUDA profilers' documentation.