The Teton HPC cluster is the successor to Mount Moran. Teton contains several new compute nodes. All Mount Moran nodes have been reprovisioned within the Teton HPC Cluster. The system is available by SSH using hostname teton.arcc.uwyo.edu or teton.uwyo.edu. We ask that everybody who uses ARCC resources cite the resources accordingly. See Citing Teton. Newcomers to research computing should also consider reading the Research Computing Quick Reference.
Live Search | ||
---|---|---|
|
Contents
Table of Contents |
---|
Training
Overview
Teton is an Intel x86_64 cluster connected via a Mellanox FDR/EDR InfiniBand and has a 1.3 PB IBM Spectrum Scale global parallel filesystem available to all nodes. The system requires UWYO two-factor authentication (2FA) for login via SSH. The default shell is BASH with Lmod modules system is leveraged for dynamic user environments to help switch software stacks rapidly and easily. The Slurm workload manager is employed to schedule jobs, provide submission limits, and implement fair share as well as provide the Quality of Service (QoS) levels for research groups who have invested in the cluster.
Teton has a Digital Object Identifier (DOI) (https://doi.org/10.15786/M2FY47) and we request that all use of Teton appropriately acknowledges the system. Please see Citing Teton for more information.
Available Nodes
See Partitions for information regarding Slurm Partitions on Teton.
...
Type
...
Series
...
Arch
...
Count
...
Sockets
...
Cores
...
Threads/Core
...
Clock (GHZ)
...
RAM (GB)
...
GPU Type
...
Local Disk Type
...
Local Disk Capacity (GB)
...
IB Network
...
Operating System
...
Status
...
Teton Regular
...
180
...
2
...
32
...
1
...
2.1
...
128
...
15
...
2.1
...
128
...
Teton BigMem GPU
...
8
...
2.1
...
512
...
Teton HugeMem
...
10
...
2.1
...
1024
...
Teton KNL
...
12
...
1.5
...
384 + 16
...
Teton DGX
...
1
...
2.2
...
512
...
Moran Regular
...
283
...
2.6
...
64 or 128
...
Moran Big Mem
...
2
...
2.6
...
512
...
Moran Debug
...
2
...
2.6
...
64
...
Moran HugeMem
...
2
...
2.6
...
1024
...
Moran DGX
...
1
...
2.2
...
512
...
Total Nodes
...
516
Global Filesystems
The Teton global parallel filesystem configured with a 160 TB SSD tier for active data and 1.2 PB HDD capacity tier for less-used data. The system policy engine moves data automatically between pools (disks and tiers). The system will automatically migrate data to HDD when the SSD tier reaches 70% used capacity. Teton has several spaces that are available for users to access described in the table below.
home/home/username ($HOME)
Space for configuration files and software installations. This file space is intended to be small and always resides on SSDs. The /home file space is snapshotted to recover from accidental deletions.
project/project/project_name/[username]
Space to collaborate among project members. Data here is persistent and is exempt from purge policy. No snapshots.
gscratch - /gscratch/username ($SCRATCH)
Space to perform computing for individual users. Data here is subject to a purge policy defined below. Warning emails will be sent when possible deletions may start to occur. No snapshots.
Global Filesystems
Filesystem | Quota (GB) | Snapshots | Backups | Purge Policy | Additional Info |
---|---|---|---|---|---|
home | 25 | Yes | No | No | Always on SDD |
project | 1024 | No | No | No | Aging Data will move to HDD |
gscratch | 5120 | No | No | Yes | Aging Data will move to HDD |
Purge Policy - File spaces within the Teton cluster filesystem may be subject to a purge policy. The policy has not yet been defined. However, ARCC reserves the right to purge data in this area after 30 to 90 days of no access or from creation time. Before performing an actual purge event, the owner of the file(s) will be notified by email several times for files that are subject to being purged.
Storage Increases on Teton
Project PIs can purchase additional scratch and/or project space at a cost of $100 / TB / year.
Additionally, PIs can request allocation increases at no cost for scratch and/or project space by submitting proposals that must be renewed when substantial cluster or storage changes occur:
the scientific gain and insights that will be or have been obtained by using the system,
how data is organized and accessed in efforts to maximize performance and usage.
Projects are limited to 1 no-cost increase.
To request more information, please contact ARCC.
Special Filesystems
Certain filesystems exist on different nodes of the cluster where specialized requirements exist. The table below summarizes these specialized filesystems.
*Make a table!
The node-local scratch or lscratch filesystem is purged at the end of each job.
The memory filesystems can really enhance the performance of small I/O operations. If you have a localized single node I/O jobs that have very intensive random access patterns, this filesystem may improve the performance of your compute job.
The petaLibrary filesystems are only available from the login nodes, not on the compute nodes. A storage space on the Teton global filesystems does not imply storage space on the ARCC petaLibrary or vice versa. For more information about the petaLibrary please see the following link petaLibrary
The Bighorn filesystems will be provided for a limited amount of time in order for researchers to move data to either the petaLibrary, Teton storage or to some other storage media. The actual date that these mounts will be removed is still TBD.
Project and Account Requests
For research projects, UWYO faculty members (Principal Investigators) can request a Project be created on Teton. PIs can then add access to the project for UWYO students, faculty and external collaborators. User Accounts on Teton require a valid UWYO e-mail address and a UWYO-Affiliated PI sponsor. UWYO faculty members can sponsor their own accounts, while students, post-doctoral researchers, or research associates must use their PI as their sponsor. Non-UWYO external collaborators must be sponsored by a current UWYO faculty member.
Follow this link Account_Policy for additional information and policy statements on account usage. Use the link under "Account Requests" to request that either a project or user(s) be created or added. From this same page, you can request that users be added to an existing project.
Note, that for external collaborators a special UWYO account must be created by the ASO office before access can be granted to Teton. There is a one time $10 fee for having these accounts created. Please allow extra time for the ASO office to create the account.
Please go to this web page to request a project be set up, ARCC Access Request Form.
Once the form is submitted, and the information verified, the project and user account(s) will be created. Users will receive an email notification once a project has been created and/or when they are added to a project.
To request access for instructional use, send an email to arcc-info@uwyo.edu with the course number, section, and student list. If the PI prefers generic accounts can be created instead of providing a student list. Instructional accounts are usually valid for a single semester and access to the project is terminated at the beginning of the next semester.In today's compute-intensive research environment, it is important to have resources that are able to perform particular tasks. ARCC provides users with The Teton Compute Environment, which is a high-performance computing (HPC) cluster with over 500 compute nodes that allow researches to perform computation-intensive analysis on large datasets. Built-in tools and the ability to request custom tools allow users to fine-tune their research procedures and have control over their data, projects, and collaborators.
Loren is a GPU based HPC cluster, it is a specialty HPC cluster used by Dr. Piri’s research group, the High Bay Research Group.
Info |
---|
This page contains commonly used words and phrases that are used in research computing, if you are unsure of any of the terms, please visit the Glossary page to learn more. |
Table of Contents |
---|
Training
Tip |
---|
Tip |
---|
...
Overview
As research becomes more compute-intensive, ARCC has made high performance compute a core service. This core service is currently being performed by The Teton Compute Environment, allowing researchers to perform computation-intensive analysis on large datasets. Using Teton, researchers have control over their data, projects, and collaborators. Built-in tools help users get up and running in a short amount of time, and the ability to request custom tools allows users to fine-tune their research procedures.
Condo Model
The model for sustaining the Condo program is premised on faculty and principal investigators using equipment purchase funds from their grants or other available funds to purchase compute nodes (individual servers) which are then added to the Teton compute cluster. Condo computing resources are used simultaneously by multiple users. Teton is a condo model resource and as such, investors do have priority on invested resources. This is implemented through preemption and jobs not associated with the investment could be requested on the system when investor submits jobs. However, if the investor chooses not to implement preemption on their resources, ARCC can disable preemption while offering next-in-line access if that mode is preferred.
There are default concurrent limits in place to prevent individual project accounts and users from saturating the cluster away from others. The default limits are listed below. To incentivize investments into the condo system, investors will have their limits increased.
The system leverages a fairshare mechanism to offer a mechanism for projects that execute jobs on a more rare occasion priority over those who continuously run jobs on the system. To incentivize investments into the condo system, investors will have their fairshare value increased as well.
Finally, individual jobs occur runtime limits based on a study that was performed in ~2014 such that our maximum walltime for a compute job is 7 days. ARCC is currently evaluating this to determine whether the orthogonal limits of CPU count and walltime are optimal operational modes. ARCC is considering concurrent usage limits based on a relational combination of CPU count, Memory, and walltime that would allow more flexibility for different areas of science. There will likely still be an upper limit on individual compute job walltime as ARCC will not allow infinite job walltime and due to possible hardware faults.