Slurm is the basis of which all jobs are to be submitted, this includes batch and interactive jobs. Slurm consists of several facing user commands, all of which have appropriate Unix man pages associated with them and should be consulted. On this page, users will find detailed information about running and submitting jobs, nodes, view available partitions, basic Slurm commands, troubleshooting and steps to configuring Slurm for investments.
...
Contents
Table of Contents |
---|
...
These can be changed by requesting different allocation schemes by modifying the appropriate flags. Please reference our Slurm documentation.
Default Limits
On Mount Moran, the default limits were specifically represented by concurrently used cores by each project account. Investors received an increase in concurrent core usage capability. To facilitate more flexible scheduling for all research groups, ARCC is looking at implementing limits based on concurrent usage of cores, memory, and walltime of jobs. This will be defined in the near future and will be subject to the FAC review.
...
Expand | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Slurm configuration on Teton is quite complicated to help with the layout of hardware, investors, and runtime limits. The following tables represent the partition on Teton. Some require a QoS which will be auto-assigned during job submission. The tables represent the Slurm allocatable units rather than hardware units. General Partitions
Investor PartitionsInvestor partitions are likely to be quite heterogeneous and may have a mix of hardware and are indicated below where appropriate. They require a special QoS for access.
Special PartitionsSpecial partitions require access to be given directly to user accounts or project accounts and likely require additional approval for access.
More DetailsGenerally, to run a job on a cluster you will need the following: A handy migration reference to compare MOAB/Torque commands to SLURM commands can be found on the SLURM home site: Batch System Rosetta Stone. |
Commands
Expand | ||
---|---|---|
| ||
sacct
salloc
sbatch
scancel
sinfo
squeue
sreport
srun
|
...
The two '#SBATCH' directives above are required for all job submissions, whether interactive or batch. The values to account should be changed to the appropriate project account and the time should be changed to an appropriate walltime limit. This is a walltime limit, not CPU time. These values could also be supplied when submitting jobs by providing them directly on the command line when submitting. Slurm will default jobs to use one node, one task per node, and once cpu per node.
...
Code Block |
---|
#!/bin/bash
#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
export $OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./application
|
Single Node, Multi-Tasks
This could be a multi-tasked job where the application has it's own parallel processing engine or uses MPI, but experiences poor scaling over multiple nodes.
Code Block |
---|
#!/bin/bash
#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
### Assuming MPI application
srun ./application
|
Multi-Node, Non-Multithreaded
...
Code Block |
---|
#!/bin/bash
#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
### Assuming 'application' is on your $PATH environment variable
srun application
|
Multi-Node, Multithreaded
...
Code Block |
---|
#!/bin/bash
#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun application -arg1 -arg2
|
Checking Status and Canceling
...