Using Your Investment Partition

 

How slurm allocates nodes to your jobs if your project has an investment, and/or if you explicitly define a partition can be a little confusing.

The following provides a high-level overview.

Explicitly Define a Partition

If you explicitly define a partition within your script then THAT is the partition (and subsequent nodes) that slurm will try to allocate your job across.

If you define a combination of nodes/cores/memory/gpus that are not provided across this partition, then slurm will not accept the job and provide an appropriate message.

This definition overrides even if you have an investment.

If You Have An Investment

If your project does have an investment, and you do NOT define a partition, then behind the scenes Slurm will create a list of partitions to try to run your job against.

Slurm will automatically find if you have an investment and try running against this first. 

  1. If there are jobs running on your investment, belonging to users who are not part of your project, then slurm will pre-empt these jobs (ie stop them and add back to the queue) and immediately start your job.

  2. But if your investment is 'full' with jobs from users who are members of your project, then it will try to allocate across the other partitions if resources are available. The list of other partitions, tried in order are: moran, teton, teton-cascade, teton-gpu and,teton-hugemem.

  3. If there are no resources available to fit your job (i.e. cluster usage is very high), then your job will have a state of pending (i.e. waiting in the queue). On a regular interval Slurm will monitor the queue and run the job when appropriate resources become available.

A job can be pending for a number of reasons, and Slurm will only show one of the many potential reasons, which can be a bit confusing. For example, it might state BadConstraint (The job's constraints can not be satisfied) or Resources (The job is waiting for resources to become available) A full list can be found here: https://slurm.schedmd.com/squeue.html#SECTION_JOB-REASON-CODES

What can confuse matters even more is that the stated reason can change depending on the overall state of the cluster.

But, the fact your job has been accepted into the queue (i.e. you have a job number) means it will eventually run when appropriate resources become available.

The above is only a brief summary of the hidden complexity on how Slurm allocates and prioritizes jobs, and there are caveats too subtle and wordy to go into in this short explanation.

Summary

  • If you explicitly define a partition then that is what will be used.

  • If you have an investment and specify no partition, Slurm will try allocating against this first, and then use a list of partitions if it can’t, and then queue the job if nothing is available at that moment in time.