Slurm is the basis of which all jobs are to be submitted, this includes batch and interactive jobs. Slurm consists of several facing user commands, all of which have appropriate Unix man pages associated with them and should be consulted. On this page, users will find detailed information about running and submitting jobs, nodes, view available partitions, basic Slurm commands, troubleshooting and steps to configuring Slurm for investments.

Contents

Table of Contents

...

These can be changed by requesting different allocation schemes by modifying the appropriate flags. Please reference our Slurm documentation.

Default Limits

On Mount Moran, the default limits were specifically represented by concurrently used cores by each project account. Investors received an increase in concurrent core usage capability. To facilitate more flexible scheduling for all research groups, ARCC is looking at implementing limits based on concurrent usage of cores, memory, and walltime of jobs. This will be defined in the near future and will be subject to the FAC review.

...

Expand

title	Click to View - Patitions

The Slurm configuration on Teton is quite complicated to help with the layout of hardware, investors, and runtime limits. The following tables represent the partition on Teton. Some require a QoS which will be auto-assigned during job submission. The tables represent the Slurm allocatable units rather than hardware units.

General Partitions

Partition	Max Walltime	Node Cnt	Core Cnt	Thds / Core	CPUS	Mem (MB) / Node	Req'd QoS
							Teton General Slurm Partitions
Moran	7-00:00:00	284	4544	1	4544	64000 or 128000	N/A
teton	7-00:00:00	180	5760	1	5760	128000	N/A
teton-gpu	7-00:00:00	8	256	1	256	512000	N/A
teton-hugemem	7-00:00:00	10	256	1	256	1024000	N/A
teton-knl	7-00:00:00	12	864	4	3456	384000	N/A

Investor Partitions

Investor partitions are likely to be quite heterogeneous and may have a mix of hardware and are indicated below where appropriate. They require a special QoS for access.

Partition	Max Walltime	Node Cnt	Core Cnt	Thds / Core	Mem (MB) / Node	Req'd QoS	Preemption	Owner	Associated Projects
									Teton Investor Slurm Partitions
inv-arcc	Unlimited	2	44	1	64000 or 192000	TODO	Disabled	Jeffrey Lang	arcc
inv-atmo2grid	7-00:00:00	31	496	1	64000	TODO	Disabled	Dr. Naughton, Dr. Mavriplis, Dr. Stoellinger	turbmodel, rotarywingcfg
inv-chemistry	7-00:00:00	6	96	1	128000	TODO	Disabled	Dr. Hulley	hulleylab, pahc, chemcal
inv-clune	7-00:00:00	16	256	1	Mixed	TODO	Disabled	Dr. Clune	evolvingai, iwctml
inv-compmicrosc	7-00:00:00	6	96	1	128000	TODO	Disabled	Dr. Aidey (Composite Micro Sciences)	rd-hea
inv-compsci	7-00:00:00	12	288	1	384999	TODO	Disabled	Dr. Lars Kotthoff	mallet
inv-fertig	7-00:00:00	1	16	1	128000	TODO	Disabled	Dr. Fertig	gbfracture
inv-geology	7-00:00:00	16	256	1	64000	TODO	Disabled	Dr.Chen, Dr. Mallick	inversion, f3dt, geologiccarbonseq, stochasticaquiferinv
inv-inbre	7-00:00:00	24	160	1	128000	TODO	Disabled	Dr. Blouin	inbre-train, inbreb, inbrev, human_microbiome
inv-jang-condel	7-00:00:00	2	32	1	128000	TODO	Disabled	Dr. Jang-Condel	exoplanets, planets
inv-liu	7-00:00:00	4	64	1	128000	TODO	Disabled	Dr. Liu	gwt
inv-microbiome	7-00:00:00	85	2816	1	128000	TODO	Disabled	Dr. Ewers	bbtrees, plantanalytics
inv-multicfd	7-00:00:00	11	352	1	128000	TODO	Disabled	Dr. Mousaviraad ,mechanical engineering	multicfd
inv-physics	7-00:00:00	4	128	1	128000	TODO	Disabled	Dr. Dahnovsky	euo, 2dferromagnetism, d0ferromagnetism, microporousmat
inv-wagner	7-00:00:00	2	32	1	128000	TODO	Disabled	Dr. Wagner	wagnerlab, latesgenomics, ltcichlidgenomics, phylogenref, ysctrout

Special Partitions

Special partitions require access to be given directly to user accounts or project accounts and likely require additional approval for access.

Partition	Max Walltime	Node Cnt	Core Cnt	Thds / Core	Mem (MB) / Node	Owner	Associated Projects	Notes
dgx	7-00:00:00	2	40	2	512000	Dr. Clune	See partition inv-clune above	NVIDIA V100 with NVLink, Ubuntu 16.04
inv-compsci	7-00:00:00	12	72	4	512000	Dr. Kotthoff	See partition inv-compsci above	This includes the KNL nodes only

More Details

Generally, to run a job on a cluster you will need the following:

User Account
UW Sponsored Project
SSH connection to the cluster

A handy migration reference to compare MOAB/Torque commands to SLURM commands can be found on the SLURM home site: Batch System Rosetta Stone.

Commands

Expand

title	Click to View - Commands

sacct

Query detailed information about the job that have has completed. Use this utility to get information about running or completed jobs

salloc

Request in an interactive job for debugging and/or interactive computing. ARCC configures the salloc command to launch an interactive shell on individual compute nodes with your current environment carried over from the current session (except in the dgx partition where the environment is reinitialized for Ubuntu). This command requires specifying a project account (-A--account=) and walltime (-t--time=).

sbatch

Submit a batch job consisting of a single job or job array. Several methods can be used to submit batch jobs. A script file can be used and provided as an argument on the command line. Alternatively, and rarer, the use of standard input can be used and the batch job can be created interactively. We recommend writing the batch job in a script so that it may be referenced at a later time.

scancel

Cancel jobs after submission. Works on pending and running jobs. By default, provide a jobid or set of jobids to cancel. Alternatively, one can use sets of flags to cancel specific jobs relating to the account, name, partition, qos, reservation, nodelist. To cancel all array tasks, specify the parent jobid.

sinfo

View the status of the Slurm partitions or nodes. Status of nodes that are drained can be seen using the -R flag.

squeue

View what is running or waiting to run in the job queue. Several modifiers and formats can be supplied to the command. You may be interested in the use of arccq as an alternative. The command arccjobs also provides a summary.

sreport

Obtain information regarding usage since the last database roll up (usually around midnight each day). sreport can be used as an interactive tool to see the usage of the clusters.

srun

A front-end launcher for job steps which includes serial and parallel jobs. srun can be considered an equivalent to mpirun or mpiexec when launching MPI jobs. Using srun inside a job is defined to be a job step that provides accounting information relating to memory, cpu time, and other parameters that are valuable when a job terminates unexpectedly or historical information is needed.

Info
There are some additional commands, however, they'll not be mentioned here because they're not that useful on our system for general users. It's important to note that reading the man pages on the Slurm commands can be highly beneficial and if you have questions, ARCC encourages you to request information on submitting jobs to arcc-help@uwyo.edu.

...

The two '#SBATCH' directives above are required for all job submissions, whether interactive or batch. The values to account should be changed to the appropriate project account and the time should be changed to an appropriate walltime limit. This is a walltime limit, not CPU time. These values could also be supplied when submitting jobs by providing them directly on the command line when submitting. Slurm will default jobs to use one node, one task per node, and once cpu per node.

...

Code Block
#!/bin/bash #SBATCH --account=arcc #SBATCH --time=24:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=4 export $OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun ./application

Single Node, Multi-Tasks

This could be a multi-tasked job where the application has it's own parallel processing engine or uses MPI, but experiences poor scaling over multiple nodes.

Code Block
#!/bin/bash #SBATCH --account=arcc #SBATCH --time=24:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=1 ### Assuming MPI application srun ./application

Multi-Node, Non-Multithreaded

...

Code Block

#!/bin/bash

#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1

### Assuming 'application' is on your $PATH environment variable
srun application

Multi-Node, Multithreaded

...

Code Block

#!/bin/bash

#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun application -arg1 -arg2

Checking Status and Canceling

...

Versions Compared

Old Version 4

New Version 5

Key

Default Limits

General Partitions

Investor Partitions

Special Partitions

More Details

Commands

Single Node, Multi-Tasks

Multi-Node, Non-Multithreaded

Multi-Node, Multithreaded

Checking Status and Canceling

Page Comparison

Versions Compared

Old Version 4

New Version 5

Key

Default Limits

General Partitions

Investor Partitions

Special Partitions

More Details

Commands

Single Node, Multi-Tasks

Multi-Node, Non-Multithreaded

Multi-Node, Multithreaded

Checking Status and Canceling