This page explains ARCC policies for high-performance computing.
Contents
These policies and procedures are intended to ensure that ARCC HPC facilities are fairly shared, effectively used, and support the University of Wyoming's research programs that rely on computational facilities not available elsewhere at the University.
Cluster
an assembly of computational hardware designed and configured to function together as a single system, much the way neurons work together to form a brain
Condo
a computational resource that is shared among many users — condo compute resources are used simultaneously by multiple users
HPC
high-performance computing generally refers to systems that perform parallel processes at a level above a teraflop or 1012 floating-point operations per second
HPS
high-performance storage system, usually a tiered system with media covering a range of speeds to optimize performance while reducing cost
Customer
a person, or group to whom ARCC provides a service
For policies that apply to all ARCC resources, see ARCC Policies.
The login nodes are provided for authorized users to access the Teton cluster.
|
HPC/HPS AccountsOverviewHPC/HPS accounts are available for all University faculty, staff, and students for the purpose of research. Account Sponsorship by a PI
General Terms of UseThe following conditions apply to all account types. Additional details on how the different account types of work can be found elsewhere on this page.
Account requestsAll HPC accounts can be requested through the ARCC Access Request Form. Note that all requests, for creating projects and adding users to a project, must be made by the project PI. For questions about HPC account procedures not addressed below, please contact ARCC. DefinitionsPrincipal Investigator (PI) Account account of a faculty member who has an extended-term position with UW (e.g. not Adjunct Faculty)Sponsored Researcher a member (e.g. Student/Graduate Assistant, Faculty, or Researcher from another Institution) of a research project for which a UW faculty member is the PISystem Account account of a staff member who has a permanent relationship with UW Account TypesPI AccountsPI accounts are for individual PIs only. These accounts are for research only and are not to be shared with anyone else. These accounts are subject to periodic review and can be deleted if the account holders change their University affiliation or fail to comply with UW and ARCC account policies. Sponsored AccountsPIs may sponsor any number of accounts, but these accounts must be used for research only. UW faculty are responsible for all of their sponsored account users. These accounts are subject to periodic review and will be deleted if the sponsoring faculty or the account holders change their University affiliation or fail to comply with UW and ARCC account policies. Instructional AccountsPIs may sponsor HPC accounts and projects for instructional purposes on the ARCC systems by submitting a request through the ARCC Access Request Form. Instructional requests are subject to denial only when the proposed use is inappropriate for the systems and/or when the instructional course would require resources that exceed available capacity on the systems or substantially interfere with research computations. HPC accounts for instructional purposes will be added by the Sponsor into a separate group created with the 'class group' designation. Class group membership is to be sponsored for one semester and the Sponsor will remove the group at the end of the semester. Class/Instructional group jobs should only be submitted to the 'class' queue, which will be equivalent in priority to the 'windfall' queue, and only available on the appropriate nodes of the ARCC systems. System AccountsSystem accounts are for staff members who have a permanent relationship with UW and are responsible for system administration. Account LifecycleAccount CreationARCC HPC/HPS accounts will be created to match existing UWYO accounts whenever possible. PIs may request accounts for existing projects/allocations or courses. Account RenewalAccount TransferA PI who is leaving the project or the University can request that their project be transferred to a new PI. Any non-PI accounts can be transferred from one PI's zone of control to another as necessary as students move working from one researcher to another. Account transfer requests will also be made by contacting the Help Desk (766-4357). Account TerminationThe VP of Research, the University and UW CIO, and University Provost comprise the University of Wyoming's Research Computing Executive Steering Committee (UW-ESC). The UW-ECS will govern the termination of Research Computing accounts, following other University policies as needed. Non-PI accounts may be terminated at the request of the UW-ESC. Any users found in violation of this Research Computing Allocation Policy or any other UWyo Policies may have access to their accounts suspended for review by the Director of Research Support, IT, and the UW-ESC. |
Job Scheduling on ARCC HPC SystemsThis section reflects the general ARCC policy for scheduling jobs on all HPC systems administered by ARCC. Since the purpose of HPC systems varies from system to system, please refer to the specific system below for policies particular to that system. TetonOverviewThis policy reflects the ARCC policy for scheduling jobs on Teton, specifically. Teton won't offer the traditional relationship between users and queues. Rather, Teton offers one, all-encompassing pool of nodes and will regulate usage using node reservations and job prioritization. Definitions/descriptions QoS
Slurm
Fairshare
Reservations
Check-pointing
DetailsQueuingARCC will use Slurm to manage Teton. Teton's compute resources will be defined as one large queue. From there ARCC will use Slurm's fairshare, reservations, and prioritization functionality to control Teton's resource utilization. Reservations will be defined for communal and individual invested users. Communal users will have access control settings that will provide preferential access to the communal reservation. Likewise, invested users will have preferential access to purchased resource levels. By default, all reservations will be shared. PrioritizationSlurm will track resource utilization based on a job's actual consumption of resources and update fairshare resource utilization statistics. These statistics will influence the priority of subsequent jobs submitted by a user. Greater utilization of resources reduces the priority of follow-on jobs submitted by a particular user. Priority decreases with resource (time and compute) usage. Priority increases or "recovers" over time. Job PreemptionGuest jobs running on a reservation will be preempted when necessary to provide resources to a job submitted by the owner of that reservation. Slurp will wait to terminate a job for 5 min after a job has been submitted by an invested user. Preempted jobs are automatically re-queued. Check-PointingBecause of the massive resource overhead involved in OS or cluster level checkpointing, ARCC won't offer check-pointing. However, users are strongly encouraged to build check-pointing into their own code. This may affect code performance but will provide a safety-net. Job Submittal Options/Limitations
Example ScenariosPlease be aware that the scenarios below are over-simplified, often glossing over some variables in order to illustrate the spotlighted situation. Some aspects are exaggerated for effect. Job Scheduling and Termination Professor X has purchased six nodes; as a result, she has a six-node reservation that may include any six nodes that have the same performance stats as the nodes she purchased. One day Professor Y has a pair of guest jobs running on two of X's nodes. Professor X also has a job running on three of her nodes then launches another job that requires another four nodes. Only one node of Professor X's reservation is available. Terminating Professor Y's two jobs won't free up enough space, so Slurm looks at the communal reservation and finds three available nodes. These three plus the one remaining in X's reservation meet X's need. Slurm allocates the resources and the job starts running. Later Professor X launches another job that requires one node. Slurm again checks her reservation and finds that terminating one of Professor Y's jobs will free up sufficient resources for the job. Slurm kills one of Professor Y's jobs and allows Professor X's new job to start. Professor Y's job is re-queued, but since it was not check-pointed, it must start over from the beginning when it is once again allowed to run. Job Priority Dr. Zed uses fifteen of the twenty nodes in the communal pool for a big, three-day job. When the job completes, he looks at the data and immediately submits another job of similar size but has to wait for the job to be scheduled because the first job reduced his priority below that of most of the other uses of the communal pool. Several days pass during which multiple small jobs are submitted and run ahead of Dr. Zed's next big job. Student Fred has been running a job on one node for four weeks and when it finally completes, Fred's priority has dropped below that of Dr. Zed. One node isn't enough for Dr. Zed's job, so now both Fred and Zed are waiting. One day later Zed's priority has increased while the priorities of other uses have decreased to the point where Dr. Zed has top priority. Unfortunately, there aren't enough nodes available for his job so he still has to wait. However, because Dr. Zed has top priority no other jobs are scheduled in front of him. Two days later enough nodes have become available for Dr. Zed's job. He's happy to see his job start and, when it completes three days later, Dr. Zed is back at the bottom of the priority food chain. |
Software Acquisition, Installation, and Support (AIS) PolicyOverviewThis document defines the ARCC's software policy regarding software acquisition, installation, and support. ARCC will help UW users through:
General software usage, functionality, and application questions should be addressed by the research computing community. To support the community support model, ARCC will make available a community-driven Known Error Database in the form of a Wiki with discussion board features. The ARCC will review the software policy annually during the Change Advisory Board meeting with input from the Faculty Advisory Committee (FAC) members. Software RequestsThe ARCC will maintain a list of currently supported software for each HPC system that ARCC supports. Supported software is evaluated biannually. The ARCC reserves the right to discontinue support for underutilized software. Software proposed for discontinued support will be placed in a "deprecated" section of the software modules interface. Movement of applications to a deprecated status will happen twice per year during system upgrades. Faculty can submit a request for continued software support of deprecated applications using the ARCC Resource Request Form. New software requests must be initiated by a faculty member via the ARCC Resource Request Form. The information on this form (e.g. estimates on the number of users, software and licensing details, software cost, and the timeline for installation) will be utilized to determine the degree of support provided by the ARCC. Software AcquisitionFinancial SupportWhenever possible, researchers are encouraged to use open-source software. The costs for discipline-specific, proprietary software will typically be born by the PIs requesting the software. A small number of one-time 'seed funding' opportunities may be made with input from the FAC. Financial support for the software is dependent on the user base, as follows:
Technical SupportARCC staff will provide support in the identification of efficient and cost-effective software packages to meet the research objectives of faculty members. Where suitable the ARCC will coordinate software acquisitions of faculty with similar objectives to better leverage research investment. Software InstallationLicensing SupportThe ARCC, with the help of UW-IT and UW General Council, will help provide guidance on research software licensing and suitable installation/controlled access. Technical SupportARCC staff will provide software installation services for software to be run on ARCC resources. Department IT consultants will help with the installation of software on researchers' workstations. The ARCC will also provide centralized license server services for software on ARCC resources when needed. Software Support
|
OverviewCode-named Teton, the ARCC high-performance storage system (HPS) is a high speed, tiered storage system designed to maximize performance while minimizing cost. Teton is intended to be used for storing data that is actively being used. The following policies discuss the use of this space. In general, the disk space is intended for support of research using the cluster, and as a courtesy to other users of the cluster, you should try to delete any files that are no longer needed or being used. All data on the HPS, are considered to be related to your research and not to be of a personal nature. As such, all data is considered to be owned by the principal investigator for the allocation through which you have access to the cluster. Teton is for the support of active research using the clusters. You should remove data files, etc. from the cluster promptly when you no longer actively working on the computations requiring them. This is to ensure that all users can avail themselves of these resources.
Storage AllocationsEach individual researcher is assigned a standard storage allocation or quota on Directory Descriptions
Directory Summary Table
Augmenting Capacity of Disk Allocation Researchers working with or generating massive data sets that exceed the default 5 TB allocation, or having significant I/O needs should consider the following options:
File Deletion PolicyThis describes ARCC's file deletion policy:
|