Slurm: More Features
Goal: Introduce some further features, such as job efficiency and cluster utilization.
Job Efficiency
You can view the cpu
and memory
efficiency of a job using the seff
command and providing a <job-id>
.
[]$ seff 13515489
Job ID: 13515489
Cluster: <cluster-name>
User/Group: <username>/<username>
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:05
CPU Efficiency: 27.78% of 00:00:18 core-walltime
Job Wall-clock time: 00:00:18
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 8.00 GB (8.00 GB/node)
Note:
Only accurate if the job is successful.
If the job fails with say an
OOM
: Out-Of-Memory the details will be inaccurate.This is emailed out if you have Slurm email notifications turned on.
What’s the Current Cluster Utilization?
There are a number of ways to see the current status of the cluster:
arccjobs
: Prints a table showing active projects and jobs.pestat
: Prints a node list with allocated jobs - can query individual nodes.sinfo
: View the status of the Slurm partitions or nodes. Status of nodes that are drained can be seen using the-R
flag.OnDemand’s MedicineBow System Status page.
ARCC Related Usage Scripts