Slurm Workshop: Summary
Goal: Provide a summary of concepts and commands covered.
Slurm Command Covered
Command | Description |
---|---|
| Create an interactive session. Notice use of short and long form options. Format for: |
| Submit a job to the cluster. |
| Cancel a pending/running job. Variations:
|
| View the status of your currently running jobs. |
| View jobs that have finished. Use the |
| View the Only accurate is the job successfully completes. If a job fails with an Out-Of-Memory (OOM) this will not be accurate. |
| View the status of the Slurm partitions/nodes. |
| Print a table showing active projects and jobs. |
| Print a node list with allocated jobs - can query individual nodes. |
Summary
Looked at:
What Slurm is and the core functionality it provides.?
How to start an interactive sessions using
salloc
, and perform job submission usingsbatch
.How to select appropriate resource allocations.
How to monitor your jobs using
squeue
andsacct
.What does a general workflow look like. Suggesting using a small/short interactive session to test and debug, then submitting large/long jobs.
Best practices in using HPC. Suggesting not to perform computation on the login nodes and being mindful of the resources you actually require and request.
How to be a good cluster citizen with respect to general cluster use and other users.
Use the following link to provide feedback on this training: Intro to Job Scheduling - Evaluation or use the QR code below.