Slurm Workshop: Summary

Goal: Provide a summary of concepts and commands covered.


Slurm Command Covered

Command

Description

Command

Description

salloc -A <project-name> -t <wall-time>

salloc --account <project-name> --time <wall-time>

Create an interactive session.

Notice use of short and long form options.

Format for: -t/--time: Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

sbatch <submission-script>

Submit a job to the cluster.

scancel <job-id>

Cancel a pending/running job.

Variations:

scancel -u <username>: cancel all your <username> jobs.

squeue -u <username>

View the status of your currently running jobs.

sacct -u <username>

View jobs that have finished.

Use the -S option to define a start time. Default is 00:00:00 of the current day.

seff <job-id>

View the cpu and memory efficiency of a job.

Only accurate is the job successfully completes. If a job fails with an Out-Of-Memory (OOM) this will not be accurate.

sinfo

View the status of the Slurm partitions/nodes.

arccjobs

Print a table showing active projects and jobs.

pestat

Print a node list with allocated jobs - can query individual nodes.


Summary

Looked at:

  • What Slurm is and the core functionality it provides.?

  • How to start an interactive sessions using salloc, and perform job submission using sbatch.

  • How to select appropriate resource allocations.

  • How to monitor your jobs using squeue and sacct.

  • What does a general workflow look like. Suggesting using a small/short interactive session to test and debug, then submitting large/long jobs.

  • Best practices in using HPC. Suggesting not to perform computation on the login nodes and being mindful of the resources you actually require and request.

  • How to be a good cluster citizen with respect to general cluster use and other users.


Use the following link to provide feedback on this training: Intro to Job Scheduling - Evaluation or use the QR code below.

job_scheduling.png