Those with limited experience using HPC schedulers may be unclear with how Slurm allocates hardware. This can seem more complicated if their project is associated with an investment.
The following provides a high-level overview to clarify the process on our clusters.
Which projects are associated with an investment?
In short, Investment-Project association is defined and approved by the Investor.
After an investment is purchased:
Investor(s) should provide ARCC with a list of projects on the HPC they want associated with the investment.
The investment will be configured so that it applies to all the projects as defined in the line above.
If new projects are created on the cluster, and you (the investor) would like to associate them with your investment, you should provide this information in the new project creation request.
If the new project request comes from another PI, you may provide that PI with your investment name, so they may provide this information in their new project request.
ARCC will reach out to the investment owner to confirm project association, and approve the project’s association with the investment.
Once configuration is initialized or updated, your investment is available to the associated projects on the cluster.
Jobs run under projects associated with an investment
Any job that are run on the HPC should specify an account in the submission/request. Projects associated with an investment are processed by Slurm in 2 ways. This is dependent upon whether you define a partition in your job request:
Request your desired resources without defining a partition
Assuming:
your project is associated with an investment
and you did not define a partition in your request
Slurm will identify a list of partitions to try to run your job against.
Slurm will automatically prioritize your investment partition and hardware to try to run your job there.
Slurm checks for jobs running on your investment.
If:
there are no jobs running on your investment
and the hardware requested in your job fits within the hardware confines of your partition
Slurm immediately starts your job.If:
there are jobs running on your investment
and they were requested by HPC users who are not a member of any projects associated with your investmentSlurm will pre-empt these jobs (ie stop them and add back to the queue)
Slurm will immediately start your job.
If:
your investment hardware is 'full' with jobs from users who are members of project(s) associated with your investmentSlurm will try to allocate your requested job across the other partitions
On MedicineBow: The list of other partitions tried in order are: mb, mb-a30, mb-l40s, mb-h100.
On Beartooth: The list of other partitions tried in order are: moran, teton, teton-cascade, teton-gpu, teton-hugemem.
If:
resources are availableSlurm will start your job
If:
there are no resources available to fit your job (i.e. cluster usage is very high)Slurm will place your job in the queue and your job will have a state of pending (i.e. waiting in the queue).
Slurm will monitor the queue on regular intervals and run the job when any appropriate resources become available.
A job can be pending for a number of reasons, and Slurm will only show one of the many potential reasons, which can be a bit confusing. For example, it might state BadConstraint (The job's constraints can not be satisfied) or Resources (The job is waiting for resources to become available) A full list can be found here: https://slurm.schedmd.com/squeue.html#SECTION_JOB-REASON-CODES
This may become more unclear because the stated reason can change depending on the overall state and use of the cluster.
Remember: The fact your job has been accepted into the queue (i.e. you have a job number) means it will eventually run when appropriate resources become available.
The above is only a brief summary of the hidden complexity on how Slurm allocates and prioritizes jobs, and there are caveats too subtle and wordy to go into in this short explanation.
Explicitly Define a Partition
Assuming:
your project is associated with an investment
and you explicitly define a partition in your request
Slurm will only use THAT partition (and subsequent nodes) to allocate your job within.
If:
you define a combination of nodes/cores/memory/gpus that are not available within the explicitly defined partitionSlurm will not accept the job and provide an appropriate message.
The explicit definition of a partition overrides the rest of your request, even if you have an investment.
Recommendation
ARCC recommends the first approach, in which you would not specify a partition. Ideally, you would only specify the hardware resources you require for the job. This allows our scheduler (Slurm) to place your jobs appropriately and make best use of the cluster. Any project users within projects associated with your investment will always be placed on your investment as long as their job fits within the hardware constraints of your investment, and no other investment-user is using the hardware. If other project users associated with your investment are using the hardware, the newest job request associated with your investment will be placed in front of any non-investors scheduling resources on the investment partition.
Summary
If you explicitly define a partition then that is what will be used.
If you have an investment and specify no partition, Slurm will try allocating against this first, and then use a list of partitions if it can’t, and then queue the job if nothing is available at that moment in time.
You may supply no partition or qos in your job submission.
If the # of resources you are requesting fit within your investment, then our submission script will place your requested job on your investment.
You may manually specify the partition of your investment.
This investment will be named
inv-<name>
.You can see the nodes within this partition by running
sinfo -p inv-<name>
replacing<name>
in the command, with your specific investment name.