Slurm: Workflows and Best Practices

Goal: Discuss what workflows can look like, being a good cluster citizen, and some best practices.


If you don’t ask, you don’t get: GPU Example

#!/bin/bash #SBATCH --account=<project-name> #SBATCH --time=1:00 #SBATCH --reservation=<reservation-name> #SBATCH --partition=mb-l40s #SBATCH --gres=gpu:1 echo "SLURM_JOB_ID:" $SLURM_JOB_ID echo "SLURM_GPUS_ON_NODE:" $SLURM_GPUS_ON_NODE echo "SLURM_JOB_GPUS:" $SLURM_JOB_GPUS echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES nvidia-smi -L # Output: SLURM_JOB_ID: 13517905 SLURM_GPUS_ON_NODE: 1 SLURM_JOB_GPUS: 0 CUDA_VISIBLE_DEVICES: 0 GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-c1859587-9722-77f3-1b3a-63e9d4fe9d4f)

If you don’t ask, you don’t get: No GPU device requested

# Comment out the gres option. ##SBATCH --gres=gpu:1 # Output: SLURM_JOB_ID: 13517906 SLURM_GPUS_ON_NODE: SLURM_JOB_GPUS: CUDA_VISIBLE_DEVICES: No devices found.

Just because a partition/compute node has something,

 you still need to explicitly request it.


Modules and using salloc and sbatch

Typically: Modules loaded, and environment variables that have been set on the login nodes will be inherited when you create an interactive salloc session and or call an sbatch.

[]$ module purge []$ module load gcc/13.2.0 r/4.4.0 []$ ml Currently Loaded Modules: 1) slurm/latest (S) 42) libxau/1.0.8 ... 41) xproto/7.0.31 []$ salloc -A arcc -t 10:00 salloc: Granted job allocation 1243593 salloc: Nodes mbcpu-025 are ready for job [@mbcpu-025 ~]$ ml Currently Loaded Modules: 1) slurm/latest (S) 15) libxml2/2.10.3 29) perl/5.38.0 43) libxdmcp/1.1.4 57) curl/8.4.0 71) openjdk/11.0.20.1_1 ... 14) xz/5.4.1 28) gdbm/1.23 42) libxau/1.0.8 56) nghttp2/1.57.0 70) openblas/0.3.24

Modules and using salloc and sbatch: Best Practice

Although modules and environment variables are typically inherited, this is not good practice since we have observed cases where not everything has been inherited.

Also, when ARCC is asked to assist, typically we have no idea, and users forget, how an environment has been setup on a login node.

Best Practice: After performing an salloc, or within the script you sbatch-ed, perform a module purge and then only module load (including versions) what you explicitly know you need to use.


What does a general workflow look like?

Getting Started:

  • Understand your application / programming language.

  • What are its capabilities / functionality.

  • Read the documentation, find examples, online forums – community.

Develop/Try/Test:

  • Typically use an interactive session (salloc) where you’re typing/trying/testing.

  • Are modules available? If not submit a HPC Software Consultation request to start the discussion.

  • Develop code/scripts.

  • Understand how the command-line works – what commands/scripts to call with options.

  • Understand if parallelization is available – can you optimize your code/application?

  • Test against a subset of data. Something that runs quick – maybe a couple of minutes/hours.

  • Do the results look correct?


What does a general workflow look like? Continued.

Production:

  • Put it all together within a bash Slurm script: 

    • Request appropriate resources using #SBATCH

    • Request appropriate wall time – hours, days…

    • Load modules: module load …

    • Run scripts/command-line.

  • Finally, submit your job to the cluster (sbatch) using a complete set of data.

    • Use: sbatch <script-name.sh>

    • Monitor job(s) progress.


What does it mean for an application to be parallel? 

Read the documentation and look at the command’s help: Does it mention:

  • Threads - multiple cpus/cores: Single node, single task, multiple cores.

    • Example: Chime

  • OpenMP: Single task, multiple cores. Set environment variable.

    • an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran.

    • Example: ImageMagick

  • MPI: Message Passing Interface: Multiple nodes, multiple tasks

  • Hybrid: MPI / OpenMP and/or threads.

    • Examples: DFTB and Quantum Espresso


What does it mean for an application to be GPU enabled? 

Read the documentation and look at the command’s help: Does it mention:

  • GPU / Nvidia / Cuda?

  • Examples:

    • Applications: AlphaFold and GPU Blast

      • Via conda based environments built with GPU libraries - and converted to Jupyter kernels:

      • Examples: TensorFlow and PyTorch 

      • Jupyter Kernels: PyTorch 1.13.1


How can I be a good cluster citizen?

  • Policies

  • Don’t run intensive applications on the login nodes.

  • Understand your software/application.

  • Shared resource - multi-tenancy.

    • Other jobs running on the same node do not affect each other.

  • Don’t ask for everything. Don’t use:

    • mem=0

    • exclusive tag.

    • Only ask for a GPU if you know it’ll be used.

  • Use /lscratch for I/O intensive tasks rather than accessing /gscratch over the network. 

    • You will need to copy files back before the job ends.

  • Track usage and job performance: seff <jobid>


Being a good Cluster Citizen: Requesting Resources 

Good Cluster Citizen:

  • Only request what you need.

  • Unless you know your application: 

    • can utilize multiple nodes/tasks/cores, request a single node/task/core (default).

    • can utilize multiple nodes/tasks/cores, requesting them will not make your code magically run faster.

    • is GPU enabled, having a GPU will not make your code magically run faster. 

  • Within your application/code check that resources are actually being detected and utilized.

    • Look at the job efficiency: job performance: seff <jobid>

    • This is emailed out if you have Slurm email notifications turned on.

  • Slurm cheatsheet