Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
minLevel1
maxLevel1
outlinefalse
stylenone
typelist
printabletrue

...

Job Efficiency

Info

You can view the cpu and memory efficiency of a job using the seff command and providing a <job-id>.

Code Block
[]$ seff 13515489
Job ID: 13515489
Cluster: <cluster-name>
User/Group: <username>/<username>
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:05
CPU Efficiency: 27.78% of 00:00:18 core-walltime
Job Wall-clock time: 00:00:18
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 8.00 GB (8.00 GB/node)

...

Info

There are a number of ways to see the current status of the cluster:

  • arccjobs: Prints a table showing active projects and jobs.

  • pestat: Prints a nodes node list with allocated jobs - can query individual nodes.

  • sinfo: View the status of the Slurm partitions or nodes. Status of nodes that are drained can be seen using the -R flag.

  • OnDemand’s MedicineBow System Status page.

...

Expand
titlesinfo examples:
Code Block
# View overall cluster:
[]$ sinfo -eO "CPUs:8,Memory:9,Gres:14,NodeAIOT:16,NodeList:50"
CPUS    MEMORY   GRES          NODES(A/I/O/T)  NODELIST
96      1023575  (null)        6/19/0/25       mbcpu-[001-025]
96      765525   gpu:a30:8     0/8/0/8         mba30-[001-008]
96      765525   gpu:l40s:8    1/4/0/5         mbl40s-[001-005]
96      765525   gpu:l40s:4    0/1/0/1         mbl40s-007
64      1023575  gpu:a6000:4   0/1/0/1         mba6000-001
48      506997   (null)        0/4/0/4         wi[001-004]
56      1031000  gpu:a30:2     0/1/0/1         wi005
96      1281554  gpu:h100:8    1/3/2/6         mbh100-[001-006]

# View a particular (investment) partition:
[]$ sinfo -p inv-wildiris
PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
inv-wildiris    up   infinite      5   idle wi[001-005]

# View compute nodes currently drained:
[]$ sinfo -R
REASON               USER      TIMESTAMP           NODELIST
HW Status: Unknown - slurm     2024-07-19T12:02:04 mbh100-001
Not responding       slurm     2024-07-30T13:49:06 mbh100-006

...

ARCC Related Usage Scripts

Info

ARCC is also developing a number of usage related scripts, based upon Slurm:

...