Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Current »

Introduction: This workshop will introduce users to job management using the Slurm system - demonstrating how to create interactive jobs and submit jobs to the cluster queue that follow a basic workflow. After the workshop, participants will understand:

  • How to create a script that defines their workflow (i.e. loading modules).

  • Understand how to start interactive sessions to work within, as well as how to submit and track  jobs on the cluster.

  • Participants will require an intro level of experience of using Linux, as well as the ability to use a text editor from the command line.

Course Goals:

  • What is Slurm?

  • How to start an Interactive sessions, and perform job submission

  • How to select appropriate resource allocations.

  • How to monitor your jobs.

  • What does a general workflow look like?

  • Best practices in using HPC.

  • How to be a good cluster citizen?



01: Slurm

Topics:

  • Slurm: 

    • Interactive sessions.

    • Job submission.

    • Resource selection.

    • Monitoring.


Workload Managers: 

  1. Allocates access to appropriate computer nodes specific to your requests.

  2. Framework for starting, executing, monitoring, and even canceling your jobs.

  3. Queue management and job state notification.


ARCC: Slurm: 


Exercises:


Interactive Session: salloc

  • You’re there doing the work.

  • Suitable for developing and testing over a few hours.

[]$ salloc -–help
# Lots of options. 
# Notice short and long form options.

[]$ salloc –A <project-name> -t <wall-time>

# Format for: --time: Acceptable time formats include "minutes", "minutes:seconds", 
"hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

Interactive Session: salloc: workshop

  • You’ll only use the reservation for this (and/or other) workshop.

  • Once you have an account you typically do not need it.

  • But there are use cases when we can create a specific reservation for you.

[]$ salloc –A arccanetrain –t 1:00 --reservation=<reservation-name>

Interactive Session: salloc: What’s happening?

[]$ salloc -A arccanetrain -t 1:00 --reservation=<reservation-name>
salloc: Granted job allocation 13526337
salloc: Nodes m233 are ready for job
# Make a note of the job id.

# Notice the server/node name has changed.
[arcc-t05@m233 intro_to_hpc]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          13526337     moran interact arcc-t05  R       0:19      1 m233
# For an interactive session: Name = interact
# You have the command-line interactively available to you.
[]$ 
...
[]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          13526337     moran interact arcc-t05  R       1:03      1 m233

# Session will automatically time out
[]$ salloc: Job 13526337 has exceeded its time limit and its allocation has been revoked.
slurmstepd: error: *** STEP 13526337.interactive ON m233 CANCELLED AT 2024-03-22T09:36:53 DUE TO TIME LIMIT ***
exit
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

Interactive Session: salloc: Finished Early?

[]$ salloc -A arccanetrain -t 1:00 --reservation=<reservation-name>
salloc: Granted job allocation 13526338
salloc: Nodes m233 are ready for job

[arcc-t05@m233 ...]$ Do stuff…

[]$ exit
exit
salloc: Relinquishing job allocation 13526338

Submit Jobs: sbatch

  • You submit a job to the queue and walk away.

  • Monitor its progress/state using command-line and/or email notifications.

  • Once complete, come back and analyze results.


Submit Jobs: sbatch: Template:

#!/bin/bash                               # Shebang indicating this is a bash script.
#SBATCH --account=arccanetrain            # Use #SBATCH to define Slurm related values.
#SBATCH --time=10:00                      # Must define an account and wall-time.
#SBATCH --reservation=<reservation-name>

echo "SLURM_JOB_ID:" $SLURM_JOB_ID        # Can access Slurm related Environment variables.


start=$(date +'%D %T')                    # Can call bash commands.
echo "Start:" $start

module load gcc/12.2.0 python/3.10.6      # Load the modules you require for your environment.
python python01.py                        # Call your scripts/commands.
sleep 1m

end=$(date +'%D %T')
echo "End:" $end

Submit Jobs: sbatch: What’s happening?

[]$ sbatch run.sh
Submitted batch job 13526340

[]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          13526340     moran   run.sh arcc-t05  R       0:05      1 m233

[]$ ls
python01.py  run.sh  slurm-13526340.out

# You can view this file while the job is still running.
[]$ cat slurm-13526340.out
SLURM_JOB_ID: 13526340
Start: 03/22/24 09:38:36
Python version: 3.10.6 (main, Oct 17 2022, 16:47:32) [GCC 12.2.0]
Version info: sys.version_info(major=3, minor=10, micro=6, releaselevel='final', serial=0)

[]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          13526340     moran   run.sh arcc-t05  R       0:17      1 m233

Submit Jobs: sbatch: What’s happening?

[]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          13526340     moran   run.sh arcc-t05  R       0:29      1 m233

[]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
# squeue only shows pending and running jobs.
# If a job is no longer in the queue then it has finished. 
# Finished can mean success, failure, timeout... It’s just no longer running.


[]$ cat slurm-13526340.out
SLURM_JOB_ID: 13526340
Start: 03/22/24 09:38:36
Python version: 3.10.6 (main, Oct 17 2022, 16:47:32) [GCC 12.2.0]
Version info: sys.version_info(major=3, minor=10, micro=6, releaselevel='final', serial=0)
End: 03/22/24 09:39:36

Submit Jobs: sbatch: Cancel?

[]$ sbatch run.sh
Submitted batch job 13526341

[]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          13526341     moran   run.sh arcc-t05  R       0:03      1 m233

[]$ scancel 13526341

[]$ squeue -u arcc-t05
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)


[]$ cat slurm-13526341.out
SLURM_JOB_ID: 13526341
Start: 03/22/24 09:40:09
Python version: 3.10.6 (main, Oct 17 2022, 16:47:32) [GCC 12.2.0]
Version info: sys.version_info(major=3, minor=10, micro=6, releaselevel='final', serial=0)
slurmstepd: error: *** JOB 13526341 ON m233 CANCELLED AT 2024-03-22T09:40:17 ***

Submit Jobs: sacct: What happened?

[]$ sacct -u arcc-t05 -X
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
13526337     interacti+      moran arccanetr+          1    TIMEOUT      0:0
13526338     interacti+      moran arccanetr+          1  COMPLETED      0:0
13526340         run.sh      moran arccanetr+          1  COMPLETED      0:0
13526341         run.sh      moran arccanetr+          1 CANCELLED+      0:0


# Lots more information
[]$ sacct --help

[]$ sacct -u arcc-t05 --format="JobID,Partition,nnodes,NodeList,NCPUS,ReqMem,State,Start,Elapsed" -X
JobID         Partition   NNodes        NodeList      NCPUS     ReqMem      State               Start    Elapsed
------------ ---------- -------- --------------- ---------- ---------- ---------- ------------------- ----------
13526337          moran        1            m233          1      1000M    TIMEOUT 2024-03-22T09:35:25   00:01:28
13526338          moran        1            m233          1      1000M  COMPLETED 2024-03-22T09:37:41   00:00:06
13526340          moran        1            m233          1      1000M  COMPLETED 2024-03-22T09:38:35   00:01:01
13526341          moran        1            m233          1      1000M CANCELLED+ 2024-03-22T09:40:08   00:00:09

Submit Jobs: sbatch: Options:

[]$ sbatch –-help
#SBATCH –-account=arccanetrain          # Required: account/time
#SBATCH –-time=72:00:00

#SBATCH –-job-name=workshop             # Job name: Help to identify when using squeue.

#SBATCH –-nodes=1                       # Options will typically have defaults.
#SBATCH –-tasks-per-node=1              # Request resources in accordance to how you want
#SBATCH –-cpus-per-task=1               # to parallelize your job, type of hardware partition
#SBATCH –-partition=teton-gpu           # and if you require a GPU.
#SBATCH –-gres=gpu:1

#SBATCH –-mem=100G                      # Request specific memory needs.
#SBATCH –-mem-per-cpu=10G

#SBATCH –-mail-type=ALL                 # Get email notifications of the state of the job.
#SBATCH –-mail-user=<email-address>

#SBATCH –-output=<prefix>_%A.out        # Define a named output file postfixed with the job id.

If you don’t ask, you don’t get: GPU Example:

#!/bin/bash
#SBATCH --account=arccanetrain
#SBATCH --time=1:00
#SBATCH --reservation=HPC_workshop
#SBATCH --partition=teton-gpu
#SBATCH --gres=gpu:1

echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_GPUS_ON_NODE:" $SLURM_GPUS_ON_NODE
echo "SLURM_JOB_GPUS:" $SLURM_JOB_GPUS
echo "CUDA_VISIBLE_DEVICES:" $CUDA_VISIBLE_DEVICES

nvidia-smi –L

# Output:
SLURM_JOB_ID: 13517905
SLURM_GPUS_ON_NODE: 1
SLURM_JOB_GPUS: 0
CUDA_VISIBLE_DEVICES: 0
GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-c1859587-9722-77f3-1b3a-63e9d4fe9d4f)

If you don’t ask, you don’t get: No GPU device requested:

# Comment out the gres option.
##SBATCH --gres=gpu:1


# Output:
SLURM_JOB_ID: 13517906
SLURM_GPUS_ON_NODE:
SLURM_JOB_GPUS:
CUDA_VISIBLE_DEVICES:
No devices found.

Just because a partition/compute node has something,

 you still need to explicitly request it.


Common Questions:

  • How do I know what number of nodes, cores, memory etc to ask for my jobs?

  • How do I find out whether a cluster/partition supports these resources?

  • How do I find out whether these resources are available on the cluster?

  • How long will I have to wait in the queue before my job starts? How busy is the cluster?

  • How do I monitor the progress of my job?


Common Questions: Suggestions:

  • How do I know what number of nodes, cores, memory etc to ask for my jobs?

    • Understand your software and application. 

      • Read the docs – look at the help for commands/options.

      • Can it run multiple threads - use multi cores (OpenMP) / nodes (MPI)?

      • Can it use a GPU? Nvidia cuda.

      • Are their suggestions on data and memory requirements?

  • How do I find out whether a cluster/partition supports these resources?

  • How do I find out whether these resources are available on the cluster?

  • How long will I have to wait in the queue before my job starts? 

    • How busy is the cluster? 

    • Current Cluster utilization: Commands sinfo / arccjobs and SouthPass status page.

  • How do I monitor the progress of my job?

    • Slurm commands: squeue


Common Issues:

  • Not defining the account and time options.

  • The account is the name of the project you are associated with. It is not your username.

  • Requesting combinations of resources that can not be satisfied: Beartooth Hardware Summary Table

    • For example, you can not request 40 cores on a teton node (max of 32).

    • Requesting too much memory, or too many GPU devices with respect to a partition.

  • My job is pending? Why? 

    • Because the resources are currently not available.

    • Have you unnecessarily defined a specific partition (restricted yourself) that is busy

    • We only have a small number of GPUs.

    • This is a shared resource - sometimes you just have to be patient…

    • Check current cluster utilization.

  • Preemption: Users of an investment get priority on their hardware.

    • We have the non-investor partition.


02: Workflows and Best Practices

Topics:

  • What does a general workflow look like?

  • Best practices in using HPC.

  • How to be a good cluster citizen?


What does a general workflow look like?

Getting Started:

  • Understand your application / programming language.

  • What are its capabilities / functionality.

  • Read the documentation, find examples, online forums – community.

Develop/Try/Test:

  • Typically use an interactive session (salloc) where you’re typing/trying/testing.

  • Are modules available? If not submit a New Software Request to get installed.

  • Develop code/scripts.

  • Understand how the command-line works – what commands/scripts to call with options.

  • Understand if parallelization is available – can you optimize your code/application?

  • Test against a subset of data. Something that runs quick – maybe a couple of minutes/hours.

  • Do the results look correct?


What does a general workflow look like?

Production:

  • Put it all together within a bash Slurm script: 

    • Request appropriate resources using #SBATCH

    • Request appropriate wall time – hours, days…

    • Load modules: module load …

    • Run scripts/command-line.

  • Finally, submit your job to the cluster (sbatch) using a complete set of data.

    • Use: sbatch <script-name.sh>

    • Monitor job(s) progress.


What does it mean for an application to be parallel? 

Read the documentation and look at the command’s help: Does it mention:

  • Threads - multiple cpus/cores: Single node, single task, multiple cores.

  • OpenMP: Single task, multiple cores. Set environment variable.

    • an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran.

    • Example: ImageMagick

  • MPI: Message Passing Interface: Multiple nodes, multiple tasks

  • Hybrid: MPI / OpenMP and/or threads.


What does it mean for an application to be GPU enabled? 

Read the documentation and look at the command’s help: Does it mention:

  • GPU / Nvidia / Cuda?

  • Examples:

    • Applications: AlphaFold and GPU Blast

      • Via conda based environments built with GPU libraries - and converted to Jupyter kernels:

      • Examples: TensorFlow and PyTorch 

      • Jupyter Kernels: PyTorch 1.13.1


How can I be a good cluster citizen?

  • Policies

  • Don’t run intensive applications on the login nodes.

  • Understand your software/application.

  • Shared resource - multi-tenancy.

    • Other jobs running on the same node do not affect each other.

  • Don’t ask for everything. Don’t use:

    • mem=0

    • exclusive tag.

    • Only ask for a GPU if you know it’ll be used.

  • Use /lscratch for I/O intensive tasks rather than accessing /gscratch over the network. 

    • You will need to copy files back before the job ends.

  • Track usage and job performance: seff <jobid>


Being a good Cluster Citizen: Requesting Resources: 

Good Cluster Citizen:

  • Only request what you need.

  • Unless you know your application: 

    • can utilize multiple nodes/tasks/cores, request a single node/task/core (default).

    • can utilize multiple nodes/tasks/cores, requesting them will not make your code magically run faster.

    • is GPU enabled, having a GPU will not make your code magically run faster. 

  • Within your application/code check that resources are actually being detected and utilized.

    • Look at the job efficiency: job performance: seff <jobid>

    • This is emailed out if you have Slurm email notifications turned on.

  • Slurm cheatsheet


Job Efficiency:

[]$ seff 13515489

Job ID: 13515489
Cluster: beartooth
User/Group: salexan5/salexan5
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:05
CPU Efficiency: 27.78% of 00:00:18 core-walltime
Job Wall-clock time: 00:00:18
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 8.00 GB (8.00 GB/node)

Note:

  • Only accurate is the job is successful.

  • If the job fails with say an OOM: Out-Of-Memory the details will be inaccurate.

  • This is emailed out if you have Slurm email notifications turned on.


03: Wrapping up the Workshop


Next Steps to look at: 

Future Workshops:

  • Using SouthPass

  • Data Access and Transfers

Look at:

  • Slurm: Requesting multiple cores/nodes, memory and GPUs.

  • Software Installation.

  • Conda: Creating and using environments.

  • Convert a conda environment to a Jupyter kernel.

  • Getting data on/off the cluster.


Summary: 

Covered:

  • Slurm: Interactive sessions, job submission, resource selection and monitoring.

  • What does a general workflow look like?

  • Best practices in using HPC.

  • How to be a good cluster citizen?

  • No labels