Slurm: Workflows and Best Practices

Goal: Discuss what workflows can look like, being a good cluster citizen, and some best practices.


Default Resources

When you perform an salloc / sbatch you will be provided with a default resource allocation if you do not explicitly request something: This will be:

  • one node.

  • one task per node.

  • one core per task.

  • no GPU.

  • default memory (this can be different depending on the partition.


If you don’t ask, you don’t get: GPU Example

Lets look at an example where we want to use a GPU device on a particular partition.

#!/bin/bash #SBATCH --account=<project-name> #SBATCH --time=1:00 #SBATCH --reservation=<reservation-name> #SBATCH --partition=mb-l40s #SBATCH --gres=gpu:1 echo "SLURM_JOB_ID:" $SLURM_JOB_ID echo "SLURM_GPUS_ON_NODE:" $SLURM_GPUS_ON_NODE echo "SLURM_JOB_GPUS:" $SLURM_JOB_GPUS echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES nvidia-smi -L # Output: SLURM_JOB_ID: 13517905 SLURM_GPUS_ON_NODE: 1 SLURM_JOB_GPUS: 0 CUDA_VISIBLE_DEVICES: 0 GPU 0: NVIDIA L40S (UUID: GPU-29a5b03e-e8f0-972b-6ae8-be4b3afe4ee0)

The SLURM_JOB_GPUS and CUDA_VISIBLE_DEVICES values represents the index of the GPU device(s), it doesn’t mean zero were allocated.

For example if we used: --gres=gpu:2 we would see something of the form:

SLURM_GPUS_ON_NODE: 2 SLURM_JOB_GPUS: 0,1 CUDA_VISIBLE_DEVICES: 0,1 GPU 0: NVIDIA L40S (UUID: GPU-4b274738-2abf-c818-ff97-d7548c769276) GPU 1: NVIDIA L40S (UUID: GPU-dfab908b-ccd9-27ab-5856-26a46cf6f89e)

If you don’t ask, you don’t get: No GPU device requested

# Comment out the gres option. ##SBATCH --gres=gpu:1 # Output: SLURM_JOB_ID: 13517906 SLURM_GPUS_ON_NODE: SLURM_JOB_GPUS: CUDA_VISIBLE_DEVICES: No devices found.

Modules and using salloc and sbatch


Modules and using salloc and sbatch: Best Practice


Track Your Job IDs


What does a general workflow look like?

Getting Started:

  • Understand your application / programming language.

  • What are its capabilities / functionality.

  • Read the documentation, find examples, online forums – community.

Develop/Try/Test:

  • Typically use an interactive session (salloc) where you’re typing/trying/testing.

  • Are modules available? If not submit a HPC Software Consultation request to start the discussion.

  • Develop code/scripts.

  • Understand how the command-line works – what commands/scripts to call with options.

  • Understand if parallelization is available – can you optimize your code/application?

  • Test against a subset of data. Something that runs quick – maybe a couple of minutes/hours.

  • Do the results look correct?


What does a general workflow look like? Continued.

Production:

  • Put it all together within a bash Slurm script: 

    • Request appropriate resources using #SBATCH

    • Request appropriate wall time – hours, days…

    • Load modules: module load …

    • Run scripts/command-line.

  • Finally, submit your job to the cluster (sbatch) using a complete set of data.

    • Use: sbatch <script-name.sh>

    • Monitor job(s) progress.


What does it mean for an application to be parallel? 

Read the documentation and look at the command’s help: Does it mention:

  • Threads - multiple cpus/cores: Single node, single task, multiple cores.

    • Example: Chime

  • OpenMP: Single task, multiple cores. Set environment variable.

    • an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran.

    • Example: ImageMagick

  • MPI: Message Passing Interface: Multiple nodes, multiple tasks

  • Hybrid: MPI / OpenMP and/or threads.

    • Examples: DFTB and Quantum Espresso


What does it mean for an application to be GPU enabled? 

Read the documentation and look at the command’s help: Does it mention:

  • GPU / Nvidia / Cuda?

  • Examples:

    • Applications: AlphaFold and GPU Blast

      • Via conda based environments built with GPU libraries - and converted to Jupyter kernels:

      • Examples: TensorFlow and PyTorch 

      • Jupyter Kernels: PyTorch 1.13.1


How can I be a good cluster citizen?

  • Policies

  • Don’t run intensive applications on the login nodes.

  • Understand your software/application.

  • Shared resource - multi-tenancy.

    • Other jobs running on the same node do not affect each other.

  • Don’t ask for everything. Don’t use:

    • mem=0

    • exclusive tag.

    • Only ask for a GPU if you know it’ll be used.

  • Use /lscratch for I/O intensive tasks rather than accessing /gscratch over the network. 

    • You will need to copy files back before the job ends.

  • Track usage and job performance: seff <jobid>


Being a good Cluster Citizen: Requesting Resources 

Good Cluster Citizen:

  • Only request what you need.

  • Unless you know your application: 

    • can utilize multiple nodes/tasks/cores, request a single node/task/core (default).

    • can utilize multiple nodes/tasks/cores, requesting them will not make your code magically run faster.

    • is GPU enabled, having a GPU will not make your code magically run faster. 

  • Within your application/code check that resources are actually being detected and utilized.

    • Look at the job efficiency: job performance: seff <jobid>

    • This is emailed out if you have Slurm email notifications turned on.

  • Slurm cheatsheet


Submitting Useful Tickets via the Portal