This is a custom Confluence template that is intended to be re-used in the creation of workshops presented by ARCC on the Wiki. All of the content in these sections is intended to be replaced by the author of the workshop. The first step in this style guide is to ensure that the the page is in wide mode to maximize the real estate for content when possible. The Title of the Page should be the same as the Title of the workshop and this section should include a quick intro to the topic, why it’s important for ARCC users, and what users should expect to get out of this workshop. Next should be a Table of Contents macro in vertical format. The Table is intended to be used as an agenda section for presenter mode as well as navigation for non-presenting viewing so that users can find the documentation and navigate to what they need to brush up on. Finally, at the end of each section, there should be a divider to indicate the separation of “slides”Goal: Introduce some further features, such as job efficiency and cluster utilization.
Table of Contents |
---|
minLevel | 1 |
---|
maxLevel | 1 |
---|
outline | false |
---|
style | none |
---|
type | list |
---|
printable | true |
---|
|
Headers and Sections
...
Code Examples
Two Column Tables are nice ways to separate content/ Background info along with a code example on the same “Slide”. Please notice the table width. This should stop scroll bars from appearing
...
Bullets are nice to include for distinct points
...
yep
...
they
...
sure
...
Job Efficiency
Info |
---|
You can view the cpu and memory efficiency of a job using the seff command and providing a <job-id> . |
Code Block |
---|
[]$ seff 13515489
Job ID: 13515489
Cluster: <cluster-name>
User/Group: <username>/<username>
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:05
CPU Efficiency: 27.78% of 00:00:18 core-walltime
Job Wall-clock time: 00:00:18
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 8.00 GB (8.00 GB/node) |
Info |
---|
Note: Only accurate if the job is successful. If the job fails with say an OOM : Out-Of-Memory the details will be inaccurate. This is emailed out if you have Slurm email notifications turned on.
|
...
What’s the Current Cluster Utilization?
Info |
---|
There are a number of ways to see the current status of the cluster: arccjobs : Prints a table showing active projects and jobs.
pestat : Prints a node list with allocated jobs - can query individual nodes.
sinfo : View the status of the Slurm partitions or nodes. Status of nodes that are drained can be seen using the -R flag.
OnDemand’s MedicineBow System Status page.
|
Expand |
---|
|
Code Block |
---|
[]$ arccjobs
===============================================================================
Account Running Pending
User jobs cpus cpuh jobs cpus cpuh
===============================================================================
eap-amadson 500 500 30.42 3 3 2.00
amadson 500 500 30.42 3 3 2.00
eap-larsko 1 32 2262.31 0 0 0.00
fghorban 1 32 2262.31 0 0 0.00
pcg-llps 2 64 1794.41 0 0 0.00
hbalantr 1 32 587.68 0 0 0.00
vvarenth 1 32 1206.73 0 0 0.00
===============================================================================
TOTALS: 503 596 4087.14 3 3 2.00
===============================================================================
Nodes 9/51 (17.65%)
Cores 596/4632 (12.87%)
Memory (GB) 2626/46952 ( 5.59%)
CPU Load |
|
...
...
...
...
...
...
...
...
...
...
...
...
Straight Code - No context
Code Block |
---|
Limit to 16 lines in the example.(17.35%)
=============================================================================== |
|
Expand |
---|
|
Code Block |
---|
[]$ pestat
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (15min) (MB) (MB) JobID(JobArrayID) User ...
mba30-001 mb-a30 idle 0 96 0.00 765525 749441
mba30-002 mb-a30 idle 0 96 0.00 765525 761311
mba30-003 mb-a30 idle 0 96 0.00 765525 761189
...
mbl40s-004 mb-l40s idle 0 96 0.00 765525 761030
mbl40s-005 mb-l40s idle 0 96 0.00 765525 760728
mbl40s-007 mb-l40s idle 0 96 0.00 765525 761452
wi001 inv-wildiris idle 0 48 0.00 506997 505745
wi002 inv-wildiris idle 0 48 0.00 506997 505726
wi003 inv-wildiris idle 0 48 0.00 506997 505746
wi004 inv-wildiris idle 0 | Thisisthe end |
Same Thing With Images
...
Two Column Tables are nice ways to separate content/ Background info along with an image example on the same “Slide”. Please notice the table width. This should stop scroll bars from appearing
Bullets are nice to include for distinct points
yep
they
sure
are
This is 14 lines
Image Removed
Alternatively No Table
Image Removed
Finally The End
...
Link to Previous sub-module or Home Module
...
0.00 506997 505729
wi005 inv-wildiris idle 0 56 0.00 1031000 1020610 |
|
Expand |
---|
|
Code Block |
---|
# View overall cluster:
[]$ sinfo -eO "CPUs:8,Memory:9,Gres:14,NodeAIOT:16,NodeList:50"
CPUS MEMORY GRES NODES(A/I/O/T) NODELIST
96 1023575 (null) 6/19/0/25 mbcpu-[001-025]
96 765525 gpu:a30:8 0/8/0/8 mba30-[001-008]
96 765525 gpu:l40s:8 1/4/0/5 mbl40s-[001-005]
96 765525 gpu:l40s:4 0/1/0/1 mbl40s-007
64 1023575 gpu:a6000:4 0/1/0/1 mba6000-001
48 506997 (null) 0/4/0/4 wi[001-004]
56 1031000 gpu:a30:2 0/1/0/1 wi005
96 1281554 gpu:h100:8 1/3/2/6 mbh100-[001-006]
# View a particular (investment) partition:
[]$ sinfo -p inv-wildiris
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
inv-wildiris up infinite 5 idle wi[001-005]
# View compute nodes currently drained:
[]$ sinfo -R
REASON USER TIMESTAMP NODELIST
HW Status: Unknown - slurm 2024-07-19T12:02:04 mbh100-001
Not responding slurm 2024-07-30T13:49:06 mbh100-006 |
|
...