Slurm: More Features
Goal: Introduce some further features, such as job efficiency and cluster utilization.
Job Efficiency
You can view the cpu and memory efficiency of a job using the seff command and providing a <job-id>.
[]$ seff 13515489
Job ID: 13515489
Cluster: <cluster-name>
User/Group: <username>/<username>
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:05
CPU Efficiency: 27.78% of 00:00:18 core-walltime
Job Wall-clock time: 00:00:18
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 8.00 GB (8.00 GB/node)Note:
Only accurate if the job is successful.
If the job fails with say an
OOM: Out-Of-Memory the details will be inaccurate.This is emailed out if you have Slurm email notifications turned on.
What’s the Current Cluster Utilization?
There are a number of ways to see the current status of the cluster:
arccjobs: Prints a table showing active projects and jobs.pestat: Prints a node list with allocated jobs - can query individual nodes.sinfo: View the status of the Slurm partitions or nodes. Status of nodes that are drained can be seen using the-Rflag.OnDemand’s MedicineBow System Status page.
ARCC Related Usage Scripts
ARCC is also developing a number of usage related scripts, based upon Slurm:
Core hour usage: chu_user, chu_account.