What is Slurm
Goal: Introduction to Slurm and how to start interactive sessions, submit jobs and monitor.
- 1 Workload Managers
- 2 Interactive Session: salloc
- 3 Submit Jobs: sbatch
- 4 Submit Jobs: sbatch: Example
- 5 Submit Jobs: squeue: What’s happening?
- 6 More squeue Information
- 7 Submission from your Current Working Directory
- 8 Submit Jobs: scancel: Cancel?
- 9 Submit Jobs: sacct: What happened?
- 10 Submit Jobs: sbatch: Options
- 11 Submit Jobs: sbatch: Options: Applied to Example
- 12 Extended Example: What Does the Run look Like?
Workload Managers
Allocates access to appropriate computer nodes specific to your requests.
Framework for starting, executing, monitoring, and even canceling your jobs.
Queue management and job state notification.
ARCC: Slurm: Wiki Pages
Slurm Related Commands
ARCC related scripts: Core hour usage: chu_user, chu_account
Interactive Session: salloc
You’re there doing the work.
Suitable for developing and testing over a few hours.
[]$ salloc -–help
[]$ man salloc
# Lots of options.
# The bare minimum.
# This will provide the defaults of one node, one core and 1G of memory.
[]$ salloc –A <project-name> -t <wall-time>
As with other Linux commands, there are short and long form for the options.
Format for:
-t/--time
: Acceptable time formats include "minutes
", "minutes:seconds
", "hours:minutes:seconds
", "days-hours
", "days-hours:minutes
" and "days-hours:minutes:seconds
".
Interactive Session: salloc
: workshop
You’ll only use the
reservation
for this (and/or other) workshop.Once you have an account you typically do not need it.
But there are use cases when we can create a specific reservation for you.
[]$ salloc –A <project-name> –t 1:00 --reservation=<reservation-name>
Interactive Session: squeue
: What’s happening?
[]$ salloc -A <project-name> -t 1:00 --reservation=<reservation-name>
salloc: Granted job allocation 13526337
salloc: Nodes m233 are ready for job
# Make a note of the job id.
# Notice the server/node name has changed.
[arcc-t05@m233 intro_to_hpc]$ squeue -u arcc-t05
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
13526337 moran interact arcc-t05 R 0:19 1 m233
# For an interactive session: Name = interact
# You have the command-line interactively available to you.
[]$
...
[]$ squeue -u arcc-t05
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
13526337 moran interact arcc-t05 R 1:03 1 m233
# Session will automatically time out
[]$ salloc: Job 13526337 has exceeded its time limit and its allocation has been revoked.
slurmstepd: error: *** STEP 13526337.interactive ON m233 CANCELLED AT 2024-03-22T09:36:53 DUE TO TIME LIMIT ***
exit
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
Interactive Session: salloc
: Finished Early?
Submit Jobs: sbatch
Submit Jobs: sbatch
: Example
Submit Jobs: squeue
: What’s happening?
Submit Jobs: squeue
: What’s happening? Continued
More squeue
Information
The main Slurm squeue page.
Submission from your Current Working Directory
Submit Jobs: scancel
: Cancel?
Submit Jobs: sacct:
What happened?
The main Slurm sacct page.
Submit Jobs: sbatch
: Options
Submit Jobs: sbatch
: Options: Applied to Example
Extended Example: What Does the Run look Like?
| Workshop Home | Next |