Slurm: Getting Started-Jobs and Nodes
Slurm is the basis of which all jobs are to be submitted, this includes batch and interactive jobs. Slurm consists of several user facing commands, all of which have appropriate Unix man pages associated with them and should be consulted. On this page, users will find detailed information about running and submitting jobs, nodes, view available partitions, basic Slurm commands, troubleshooting and steps to configure Slurm for investments.
Required Inputs and Default Values and Limits
There are some default limits set for Slurm jobs. By default the following are required for submission:
Walltime limit: --time=[days-hours:mins:secs]
Project account: --account=account
Default Values
Additionally, the default submission has the following characteristics:
nodes is for one node (-N 1, --nodes=1)
task count one tasks (-n 1, --ntasks-per-node=1)
memory amount 1000 MB RAM / CPU (--mem-per-cpu=1000).
These can be changed by requesting different allocation schemes by modifying the appropriate flags. Please reference our Slurm documentation.
Default Limits
On historic ARCC HPC resources, the default limits were specifically represented by concurrently used cores by each project account. Investors received an increase in concurrent core usage capability. To facilitate more flexible scheduling for all research groups, ARCC is looking at implementing limits based on concurrent usage of cores, memory, and walltime of jobs. This will be defined in the near future and will be subject to the FAC review.
Query detailed information about the job that has completed. Use this utility to get information about running or completed jobs
Request an interactive job for debugging and/or interactive computing. ARCC configures the salloc command to launch an interactive shell on individual compute nodes with your current environment carried over from the current session. This command requires specifying a project account (
) and walltime (-t
Submit a batch job consisting of a single job or job array. Several methods can be used to submit batch jobs. A script file can be used and provided as an argument on the command line. Alternatively, and rarer, the use of standard input can be used and the batch job can be created interactively. We recommend writing the batch job in a script so that it may be referenced at a later time.
Cancel jobs after submission. Works on pending and running jobs. By default, provide a jobid or set of jobids to cancel. Alternatively, one can use sets of flags to cancel specific jobs relating to the account, name, partition, qos, reservation, nodelist. To cancel all array tasks, specify the parent jobid.
View the status of the Slurm partitions or nodes. Status of nodes that are drained can be seen using the -R flag.
View what is running or waiting to run in the job queue. Several modifiers and formats can be supplied to the command. You may be interested in the use of arccq as an alternative. The command arccjobs also provides a summary.
Obtain information regarding usage since the last database roll up (usually around midnight each day). sreport can be used as an interactive tool to see the usage of the clusters.
A front-end launcher for job steps which includes serial and parallel jobs. srun can be considered an equivalent to mpirun or mpiexec when launching MPI jobs. Using srun inside a job is defined to be a job step that provides accounting information relating to memory, cpu time, and other parameters that are valuable when a job terminates unexpectedly or historical information is needed.