Slurm: Workflows and Best Practices
Goal: Discuss what workflows can look like, being a good cluster citizen, and some best practices.
- 1 Default Resources
- 2 If you don’t ask, you don’t get: GPU Example
- 3 Modules and using salloc and sbatch
- 4 Modules and using salloc and sbatch: Best Practice
- 5 Track Your Job IDs
- 6 What does a general workflow look like?
- 7 What does it mean for an application to be parallel?
- 8 What does it mean for an application to be GPU enabled?
- 9 How can I be a good cluster citizen?
- 10 Being a good Cluster Citizen: Requesting Resources
- 11 Submitting Useful Tickets via the Portal
Default Resources
When you perform an salloc
/ sbatch
you will be provided with a default resource allocation if you do not explicitly request something: This will be:
one node.
one task per node.
one core per task.
no GPU.
default memory (this can be different depending on the partition.
If you don’t ask, you don’t get: GPU Example
Lets look at an example where we want to use a GPU device on a particular partition.
#!/bin/bash
#SBATCH --account=<project-name>
#SBATCH --time=1:00
#SBATCH --reservation=<reservation-name>
#SBATCH --partition=mb-l40s
#SBATCH --gres=gpu:1
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_GPUS_ON_NODE:" $SLURM_GPUS_ON_NODE
echo "SLURM_JOB_GPUS:" $SLURM_JOB_GPUS
echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES
nvidia-smi -L
# Output:
SLURM_JOB_ID: 13517905
SLURM_GPUS_ON_NODE: 1
SLURM_JOB_GPUS: 0
CUDA_VISIBLE_DEVICES: 0
GPU 0: NVIDIA L40S (UUID: GPU-29a5b03e-e8f0-972b-6ae8-be4b3afe4ee0)
The SLURM_JOB_GPUS
and CUDA_VISIBLE_DEVICES
values represents the index of the GPU device(s), it doesn’t mean zero were allocated.
For example if we used: --gres=gpu:2
we would see something of the form:
SLURM_GPUS_ON_NODE: 2
SLURM_JOB_GPUS: 0,1
CUDA_VISIBLE_DEVICES: 0,1
GPU 0: NVIDIA L40S (UUID: GPU-4b274738-2abf-c818-ff97-d7548c769276)
GPU 1: NVIDIA L40S (UUID: GPU-dfab908b-ccd9-27ab-5856-26a46cf6f89e)
If you don’t ask, you don’t get: No GPU device requested
# Comment out the gres option.
##SBATCH --gres=gpu:1
# Output:
SLURM_JOB_ID: 13517906
SLURM_GPUS_ON_NODE:
SLURM_JOB_GPUS:
CUDA_VISIBLE_DEVICES:
No devices found.
Modules and using salloc and sbatch
Modules and using salloc and sbatch: Best Practice
Track Your Job IDs
What does a general workflow look like?
Getting Started:
Understand your application / programming language.
What are its capabilities / functionality.
Read the documentation, find examples, online forums – community.
Develop/Try/Test:
Typically use an interactive session (salloc) where you’re typing/trying/testing.
Are modules available? If not submit a HPC Software Consultation request to start the discussion.
Develop code/scripts.
Understand how the command-line works – what commands/scripts to call with options.
Understand if parallelization is available – can you optimize your code/application?
Test against a subset of data. Something that runs quick – maybe a couple of minutes/hours.
Do the results look correct?
What does a general workflow look like? Continued.
Production:
Put it all together within a bash Slurm script:
Request appropriate resources using
#SBATCH
Request appropriate wall time – hours, days…
Load modules:
module load …
Run scripts/command-line.
Finally, submit your job to the cluster (sbatch) using a complete set of data.
Use:
sbatch <script-name.sh>
Monitor job(s) progress.
What does it mean for an application to be parallel?
Read the documentation and look at the command’s help: Does it mention:
Threads - multiple cpus/cores: Single node, single task, multiple cores.
Example: Chime
OpenMP: Single task, multiple cores. Set environment variable.
an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran.
Example: ImageMagick
MPI: Message Passing Interface: Multiple nodes, multiple tasks
OpenMPI: ARCC Wiki: OpenMPI and oneAPI Compiling,
Hybrid: MPI / OpenMP and/or threads.
Examples: DFTB and Quantum Espresso
What does it mean for an application to be GPU enabled?
Read the documentation and look at the command’s help: Does it mention:
GPU / Nvidia / Cuda?
Examples:
Applications: AlphaFold and GPU Blast
Via conda based environments built with GPU libraries - and converted to Jupyter kernels:
Examples: TensorFlow and PyTorch
Jupyter Kernels: PyTorch 1.13.1
How can I be a good cluster citizen?
Don’t run intensive applications on the login nodes.
Understand your software/application.
Shared resource - multi-tenancy.
Other jobs running on the same node do not affect each other.
Don’t ask for everything. Don’t use:
mem=0
exclusive tag.
Only ask for a GPU if you know it’ll be used.
Use
/lscratch
for I/O intensive tasks rather than accessing/gscratch
over the network.You will need to copy files back before the job ends.
Track usage and job performance:
seff <jobid>
Being a good Cluster Citizen: Requesting Resources
Good Cluster Citizen:
Only request what you need.
Unless you know your application:
can utilize multiple nodes/tasks/cores, request a single node/task/core (default).
can utilize multiple nodes/tasks/cores, requesting them will not make your code magically run faster.
is GPU enabled, having a GPU will not make your code magically run faster.
Within your application/code check that resources are actually being detected and utilized.
Look at the job efficiency: job performance:
seff <jobid>
This is emailed out if you have Slurm email notifications turned on.
Slurm cheatsheet
Submitting Useful Tickets via the Portal
Prev | Workshop Home | Next |