You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 12
Next »
Goal: Introduce some further features, such as job efficiency and cluster utilization.
Job Efficiency
[]$ seff 13515489
Job ID: 13515489
Cluster: <cluster-name>
User/Group: <username>/<username>
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:05
CPU Efficiency: 27.78% of 00:00:18 core-walltime
Job Wall-clock time: 00:00:18
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 8.00 GB (8.00 GB/node)
What’s the Current Cluster Utilization?
arccjobs example
[]$ arccjobs
===============================================================================
Account Running Pending
User jobs cpus cpuh jobs cpus cpuh
===============================================================================
eap-amadson 500 500 30.42 3 3 2.00
amadson 500 500 30.42 3 3 2.00
eap-larsko 1 32 2262.31 0 0 0.00
fghorban 1 32 2262.31 0 0 0.00
pcg-llps 2 64 1794.41 0 0 0.00
hbalantr 1 32 587.68 0 0 0.00
vvarenth 1 32 1206.73 0 0 0.00
===============================================================================
TOTALS: 503 596 4087.14 3 3 2.00
===============================================================================
Nodes 9/51 (17.65%)
Cores 596/4632 (12.87%)
Memory (GB) 2626/46952 ( 5.59%)
CPU Load 803.43 (17.35%)
===============================================================================
pestat example
[]$ pestat
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (15min) (MB) (MB) JobID(JobArrayID) User ...
mba30-001 mb-a30 idle 0 96 0.00 765525 749441
mba30-002 mb-a30 idle 0 96 0.00 765525 761311
mba30-003 mb-a30 idle 0 96 0.00 765525 761189
...
mbl40s-004 mb-l40s idle 0 96 0.00 765525 761030
mbl40s-005 mb-l40s idle 0 96 0.00 765525 760728
mbl40s-007 mb-l40s idle 0 96 0.00 765525 761452
wi001 inv-wildiris idle 0 48 0.00 506997 505745
wi002 inv-wildiris idle 0 48 0.00 506997 505726
wi003 inv-wildiris idle 0 48 0.00 506997 505746
wi004 inv-wildiris idle 0 48 0.00 506997 505729
wi005 inv-wildiris idle 0 56 0.00 1031000 1020610
sinfo examples:
# View overall cluster:
[]$ sinfo -eO "CPUs:8,Memory:9,Gres:14,NodeAIOT:16,NodeList:50"
CPUS MEMORY GRES NODES(A/I/O/T) NODELIST
96 1023575 (null) 6/19/0/25 mbcpu-[001-025]
96 765525 gpu:a30:8 0/8/0/8 mba30-[001-008]
96 765525 gpu:l40s:8 1/4/0/5 mbl40s-[001-005]
96 765525 gpu:l40s:4 0/1/0/1 mbl40s-007
64 1023575 gpu:a6000:4 0/1/0/1 mba6000-001
48 506997 (null) 0/4/0/4 wi[001-004]
56 1031000 gpu:a30:2 0/1/0/1 wi005
96 1281554 gpu:h100:8 1/3/2/6 mbh100-[001-006]
# View a particular (investment) partition:
[]$ sinfo -p inv-wildiris
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
inv-wildiris up infinite 5 idle wi[001-005]
# View compute nodes currently drained:
[]$ sinfo -R
REASON USER TIMESTAMP NODELIST
HW Status: Unknown - slurm 2024-07-19T12:02:04 mbh100-001
Not responding slurm 2024-07-30T13:49:06 mbh100-006