Overview: What is Performance

In it’s simplest form there are three performance metrics any user can consider:

  1. How much the core(s) were utilized?

  2. How much memory was used?

  3. How long the job took?

  4. How much read/write was performed? This isn’t currently covered on this page.

Check Overall Job Efficiency

To get a general sense on how a job has performed you can use the following two methods:

Command: seff

From the command line use the seff command to display basic job performance:

[]$ seff 12347496
Job ID: 12347496
Cluster: teton
User/Group: salexan5/salexan5
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 40
CPU Utilized: 3-23:17:34
CPU Efficiency: 99.25% of 4-00:00:40 core-walltime
Job Wall-clock time: 02:24:01
Memory Utilized: 102.99 MB
Memory Efficiency: 0.13% of 80.00 GB

There is the -d option that also displays the raw data.

[]$ seff -d 12347496
Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus
Slurm data: 12347496  salexan5 salexan5 COMPLETED teton 40 1 1 83886080 1 343054 8641 105460 0

Job ID: 12347496
Cluster: teton
User/Group: salexan5/salexan5
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 40
CPU Utilized: 3-23:17:34
CPU Efficiency: 99.25% of 4-00:00:40 core-walltime
Job Wall-clock time: 02:24:01
Memory Utilized: 102.99 MB
Memory Efficiency: 0.13% of 80.00 GB

These details are only accurate if the job successfully completed.

Email

Depending on how many jobs you have running you can use the slurm email options. By adding the following two lines to your bash script, a mini report will be emailed out to you when the state of a job changes, this can include started, pre-empted, finished. When a job has finished, then the results that you can retrieve using the seff command are emailed out to you. Obviously, if you are submitting 100s/1000s of jobs then this can be impractical.

#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-address>

Command: sacct

The sacct command displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database.

If you want more details that provided by the seff command, then sacct provides a lot more fields.

Calling sacct -e will list all fields available. Please refer to the slurm page for descriptions.

[salexan5@tlog1 teton-cascade]$ sacct -e
Account             AdminComment        AllocCPUS           AllocGRES
...

Example

[]$ sacct --format="JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AveRSS,AveCPU" -j 12347496
               JobID    JobName      User  Partition        NodeList    Elapsed      State ExitCode     MaxRSS     AveRSS     AveCPU
-------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- ---------- ----------
            12347496    vsearch  salexan5 teton-cas+            t465   02:24:01  COMPLETED      0:0
      12347496.batch      batch                                 t465   02:24:01  COMPLETED      0:0    105460K    105460K 3-23:17:32

Starting to Interpret Jobs

Computation

Here is a contrived example, but it starts to demonstrate some of the ideas to start looking out for. I know the application I’m running only invokes four threads, so in my slurm script I only need to request cpus-per-task=4. If I submit the job and then look at the seff and equivalent sacct field details:

[]$ seff 12348706
Job ID: 12348706
Cluster: teton
User/Group: salexan5/salexan5
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 4
CPU Utilized: 00:01:36
CPU Efficiency: 88.89% of 00:01:48 core-walltime
Job Wall-clock time: 00:00:27
Memory Utilized: 2.63 MB
Memory Efficiency: 0.07% of 3.91 GB

[]$ sacct --format="JobID%20,NCPUS,Timelimit,Elapsed,CPUTime,SystemCPU,UserCPU,TotalCPU" -j 12348706
               JobID      NCPUS  Timelimit    Elapsed    CPUTime  SystemCPU    UserCPU   TotalCPU
-------------------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
            12348706          4   00:05:00   00:00:27   00:01:48  00:00.005  01:35.691  01:35.697
      12348706.batch          4              00:00:27   00:01:48  00:00.005  01:35.691  01:35.697

Using the definitions as defined is the sacct documentation:

So, we can read this as the job:

Now, if I rerun the job, but this time set cpus-per-task=16 I get the following results:

[]$ seff 12348707
Job ID: 12348707
Cluster: teton
User/Group: salexan5/salexan5
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:01:27
CPU Efficiency: 24.72% of 00:05:52 core-walltime
Job Wall-clock time: 00:00:22
Memory Utilized: 2.63 MB
Memory Efficiency: 0.02% of 15.62 GB

[salexan5@tlog1 posix]$ sacct --format="JobID%20,NCPUS,Timelimit,Elapsed,CPUTime,SystemCPU,UserCPU,TotalCPU" -j 12348707
               JobID      NCPUS  Timelimit    Elapsed    CPUTime  SystemCPU    UserCPU   TotalCPU
-------------------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
            12348707         16   00:05:00   00:00:22   00:05:52  00:00.008  01:26.810  01:26.819
      12348707.batch         16              00:00:22   00:05:52  00:00.008  01:26.810  01:26.819

The main thing to notice here is that the CPU efficiency has drastically dropped to 24.72%! Why?

Remember, I know that only four cores are actually being used, but I requested 16, so in this case 12 of the cores are not being used, at all.

But what if you didn’t know? This should indicate to you to start questioning, and thus exploring, how does the application use multiple cores. Why is this worth considering?

Memory

Lets look at the memory utilization of the two jobs:

[]$ seff 12348706
Memory Utilized: 2.63 MB
Memory Efficiency: 0.07% of 3.91 GB

[]$ sacct --format="JobID%20,NCPUS,ReqMEM,MaxRSS,AveRSS,AllocTRES%40" -j 12348706
               JobID      NCPUS     ReqMem     MaxRSS     AveRSS                                AllocTRES
-------------------- ---------- ---------- ---------- ---------- ----------------------------------------
            12348706          4     1000Mc                               billing=4,cpu=4,mem=4000M,node=1
      12348706.batch          4     1000Mc      2696K      2696K                   cpu=4,mem=4000M,node=1
[]$ seff 12348707
Memory Utilized: 2.63 MB
Memory Efficiency: 0.02% of 15.62 GB

[]$ sacct --format="JobID%20,NCPUS,ReqMEM,MaxRSS,AveRSS,AllocTRES%40" -j 12348707
               JobID      NCPUS     ReqMem     MaxRSS     AveRSS                                AllocTRES
-------------------- ---------- ---------- ---------- ---------- ----------------------------------------
            12348707         16     1000Mc                            billing=16,cpu=16,mem=16000M,node=1
      12348707.batch         16     1000Mc      2692K      2692K                 cpu=16,mem=16000M,node=1

Using the definitions as defined in the sacct documentation:

In our slurm script we didn’t define any memory requirements, so we were allocated the default of 1000M per cpu. So we can calculate the total memory required for the jobs as 4 * 1000M = 3.91GB and 16 * 1000M = 15.62GB. The MaxRSS / AveRSS values provide an indication of what was actually used.

I know for this example that no large amounts of memory are allocated. So, in both cases very little of the 3.91GB and 15.62GB that was allocated was actually used. Why is this worth considering?

These are two basic examples that introduce you to some of the reasons why it’s important and useful to track and analyze your job efficiency.

Time

Lets look at the time requested, and the actual time taken for the two jobs:

[]$ seff 12348706
Job Wall-clock time: 00:00:27

[]$ sacct --format="JobID%20,NCPUS,Timelimit,Elapsed" -j 12348706
               JobID      NCPUS  Timelimit    Elapsed
-------------------- ---------- ---------- ----------
            12348706          4   00:05:00   00:00:27
      12348706.batch          4              00:00:27
[]$ seff 12348707
Job Wall-clock time: 00:00:22

[salexan5@tlog1 posix]$ sacct --format="JobID%20,NCPUS,Timelimit,Elapsed" -j 12348707
               JobID      NCPUS  Timelimit    Elapsed
-------------------- ---------- ---------- ----------
            12348707         16   00:05:00   00:00:22
      12348707.batch         16              00:00:22

Although 5 minutes was requested for both, they both took about the same time 27 and 22 seconds. Why is this worth considering?

Consider the following case: I’m going to submit a job on the 24th at noon, with time=7:00:00:00 (7 days) but I’m aware that the cluster is under maintenance on the 28th starting at midnight. Since we can see that the cluster will not be available in three and a half days, our seven day job can not fit within this window, and thus will be added to the queue in a pending state, as demonstrated in the squeue results below.

 squeue | grep salexan5
          12348708 moran,tet test     salexan5 PD       0:00      1 (ReqNodeNotAvail, Reserved for maintenance)

Alternative: Command: Time

There are alternatives to using the slurm commands such as using the Linux time command into your script, and/or bash commands as demonstrated in the following examples:

Bash: time

Here is a bash time example:

#!/bin/bash
...
start=$(date +'%D %T')
echo "Start:" $start

time <run your application>

end=$(date +'%D %T')
echo "End:" $end

start_secs=$(date --date="$start" '+%s')
end_secs=$(date --date="$end"   '+%s')
duration=$((end_secs - start_secs))
echo "Duration:" $duration"sec"
echo "Done."

Using the above, within your output you will see something of the form:

Start: 01/27/21 21:10:39
...
real    0m21.951s
user    1m26.882s
sys     0m0.001s
End: 01/27/21 21:11:01
Duration: 22sec
Done.

System: time

The alternative is to use the system’s time command. Have a read of the man time page for more details, and look at the -v option depending on how much detail you want when using the command:

[]$ man time
TIME(1)                       Linux User’s Manual                      TIME(1)

NAME
       time - time a simple command or give resource usage

SYNOPSIS
       time [options] command [arguments...]

DESCRIPTION
       The  time  command  runs the specified program command with the given arguments.  When command finishes, time writes a message to standard error giving
       timing statistics about this program run.  These statistics consist of (i) the elapsed real time between invocation and termination, (ii) the user  CPU
       time  (the sum of the tms_utime and tms_cutime values in a struct tms as returned by times(2)), and (iii) the system CPU time (the sum of the tms_stime
       and tms_cstime values in a struct tms as returned by times(2)).

       Note: some shells (e.g., bash(1)) have a built-in time command that provides less functionality than the command described here.  To  access  the  real
       command, you may need to specify its pathname (something like /usr/bin/time).
...

The only change in the above example is the following:

/usr/bin/time ./a.out
or
/usr/bin/time -v ./a.out

Resulting in:

95.21user 0.00system 0:23.83elapsed 399%CPU (0avgtext+0avgdata 728maxresident)k
0inputs+0outputs (0major+276minor)pagefaults 0swaps
or
        Command being timed: "./a.out"
        User time (seconds): 95.80
        System time (seconds): 0.00
        Percent of CPU this job got: 399%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.99
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 740
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 279
        Voluntary context switches: 6
        Involuntary context switches: 116
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Check Running Jobs

Command: sstat

Similar to sacct, the sstat command display various status information of a job/step while it is actually running (sacct is more post completion).

It is a command-line tool, but scripts can be developed that call the command on a regular interval while a job is running and you can monitor various key metrics such as CPU frequency, max and average RSS, as well as disk reads and rights.

Note: