...
Request your desired resources without defining a partition
AssumingHere, we assume:
your project is associated with an investment
and you did not define a partition in your request
Slurm will identify a list of partitions to try to run your job against.
Slurm will automatically prioritize your investment partition and hardware to try to run your job there.
Slurm checks for jobs running on your investment.
IfSlurm checks if:
there are no jobs running on your investment
and the hardware requested in your job fits within the hardware confines of your partitionSlurm immediately starts your job.
IfSlurm checks if:
there are jobs running on your investment
and they were requested by HPC users who are not a member of any projects associated with your investmentSlurm will pre-empt these jobs (ie stop them and add back to the queue)
Slurm will immediately start your job.
IfSlurm checks if:
your investment hardware is 'full' with jobs from users who are members of project(s) associated with your investmentSlurm will try to allocate your requested job across the other partitions
On MedicineBow: The list of other partitions tried in order are: mb, mb-a30, mb-l40s, mb-h100.
On Beartooth: The list of other partitions tried in order are: moran, teton, teton-cascade, teton-gpu, teton-hugemem. If
Slurm checks if:
resources are availableSlurm will start your job
IfSlurm checks if:
there are no resources available to fit your job (i.e. cluster usage is very high)Slurm will place your job in the queue and your job will have a state of pending (i.e. waiting in the queue).
Slurm will monitor the queue on regular intervals and run the job when any appropriate resources become available.
...
Explicitly Define a Partition
AssumingHere, we assume:
your project is associated with an investment
and you explicitly define a partition in your request
Slurm will only use THAT partition (and subsequent nodes) to allocate your job within.
IfSlurm checks if:
you define a combination of nodes/cores/memory/gpus that are not available within the explicitly defined partitionSlurm will not accept the job and provide an appropriate message.
...