What is Slurm

What is Slurm

Goal: Introduction to Slurm and how to start interactive sessions, submit jobs and monitor.

Workload Managers 

  1. Allocates access to appropriate computer nodes specific to your requests.

  2. Framework for starting, executing, monitoring, and even canceling your jobs.

  3. Queue management and job state notification.

ARCC: Slurm: Wiki Pages 

A quick read can be found under: Slurm: Getting Started-Jobs and Nodes

ARCC also hosts a number of more detailed and specific wiki pages:

Interactive Session: salloc

You’re there doing the work.

Suitable for developing and testing over a few hours.

[]$ salloc -–help []$ man salloc # Lots of options. # The bare minimum. # This will provide the defaults of one node, one core and 1G of memory. []$ salloc –A <project-name> -t <wall-time>

Interactive Session: salloc: workshop

# CPU only compute node. []$ salloc –A <project-name> –t 1:00 --reservation=<reservation-name> # GPU partition/compute node. []$ salloc –A <project-name> –t 1:00 --reservation=<reservation-name> --partition=<partition-name>

Interactive Session: squeue: What’s happening?

[]$ salloc -A <project-name> -t 1:00 --reservation=<reservation-name> salloc: Granted job allocation 13526337 salloc: Nodes m233 are ready for job # Make a note of the job id. # Notice the server/node name has changed. [arcc-t05@m233 intro_to_hpc]$ squeue -u arcc-t05 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 13526337 moran interact arcc-t05 R 0:19 1 m233 # For an interactive session: Name = interact # You have the command-line interactively available to you. []$ ... []$ squeue -u arcc-t05 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 13526337 moran interact arcc-t05 R 1:03 1 m233 # Session will automatically time out []$ salloc: Job 13526337 has exceeded its time limit and its allocation has been revoked. slurmstepd: error: *** STEP 13526337.interactive ON m233 CANCELLED AT 2024-03-22T09:36:53 DUE TO TIME LIMIT *** exit srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

Interactive Session: salloc: Finished Early?

Exercise: salloc: Give It A Go

Submit Jobs: sbatch

Submit Jobs: sbatch: Example

Submit Jobs: squeue: What’s happening?

Submit Jobs: squeue: What’s happening? Continued

More squeue Information

Submission from your Current Working Directory

Submit Jobs: scancel: Cancel?

Submit Jobs: sacct: What happened?

Submit Jobs: sbatch: Options

Submit Jobs: sbatch: Options: Applied to Example

Extended Example: What Does the Run look Like?

Exercise: sbatch: Give It A Go



Related content

Slurm Workshop: Summary
Slurm Workshop: Summary
More like this
Slurm: More Features
Slurm: More Features
More like this
Slurm: Workflows and Best Practices
Slurm: Workflows and Best Practices
More like this
HPC System and Job Queries
HPC System and Job Queries
Read with this
Slurm: Getting Started-Jobs and Nodes
Slurm: Getting Started-Jobs and Nodes
More like this
Intro to Linux Command-Line: The File System
Intro to Linux Command-Line: The File System
Read with this