Performance Application Programming Interface (PAPI) provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.
In addition, PAPI provides access to a collection of components that expose performance measurement opportunities across the hardware and software stack.
PAPI is built with a number of components. If a required component is not installed please contact ARCC and request what you need.
Using
Use the module name papi to discover versions available and to load the application.
To get a list of the components installed with a version of papi please use the papi_component_avail command. An example of calling the command is as follows:
[@blog2 ~]$ papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI version : 6.0.0.1
Operating system : Linux 4.18.0-372.32.1.el8_6.x86_64
Vendor string and code : GenuineIntel (1, 0x1)
Model string and code : Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz (106, 0x6a)
CPU revision : 6.000000
CPUID : Family/Model/Stepping 6/106/6, 0x06/0x6a/0x06
CPU Max MHz : 3500
CPU Min MHz : 800
Total cores : 96
SMT threads per core : 2
Cores per socket : 24
Sockets : 2
Cores per NUMA region : 48
NUMA regions : 2
Running in a VM : no
Number Hardware Counters : 19
Max Multiplex Counters : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------
Compiled-in components:
Name: perf_event Linux perf_event CPU counters
Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
\-> Disabled: No uncore PMUs or events found
Name: example A simple example component
Name: infiniband Linux Infiniband statistics using the sysfs interface
Active components:
Name: perf_event Linux perf_event CPU counters
Native: 154, Preset: 0, Counters: 19
PMUs supported: ix86arch, perf, perf_raw, icl
Name: example A simple example component
Native: 4, Preset: 0, Counters: 3
Name: infiniband Linux Infiniband statistics using the sysfs interface
Native: 96, Preset: 0, Counters: 96
--------------------------------------------------------------------------------
Multicore
The papi library can be used to develop serial, multithreaded, and multinode applications. Please refer to the PAPI MPI documentation for more information about using this functionality.
MPI Example
Below is an example of code using both papi and openmpi:
/*
Code based on: https://bitbucket.org/icl/papi/wiki/PAPI-Parallel-Programs
*/
#include <papi.h>
#include "mpi.h"
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
void handle_error (int retval);
int main(int argc, char *argv[] )
{
{
int done = 0, n, myid, numprocs, i, rc, retval, EventSet = PAPI_NULL;
double PI25DT = 3.141592653589793238462643;
double mypi, pi, h, sum, x, a;
long_long values[1] = {(long_long) 0};
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
/* Initialize the PAPI library */
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT) {handle_error(retval);
}
/* Create an EventSet */
retval = PAPI_create_eventset(&EventSet);
if (retval != PAPI_OK){ handle_error(retval);}
/* Add Total Instructions Executed to our EventSet */
retval = PAPI_add_event(EventSet, PAPI_TOT_INS);
if (retval != PAPI_OK){ handle_error(retval);}
/* Start counting */
retval = PAPI_start(EventSet);
if (retval != PAPI_OK){ handle_error(retval);}
n = 50;
for(int n = 50; n <= 150; n += 50)
{
if (myid == 0) {
printf("n = %i\n",n);
}
MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
h = 1.0 / (double) n;
sum = 0.0;
for (i = myid + 1; i <= n; i += numprocs) {
x = h * ((double)i - 0.5);
sum += 4.0 / (1.0 + x*x);
}
mypi = h * sum;
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);
if (myid == 0){
printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT));
}
}
/* Read the counters */
retval = PAPI_read(EventSet, values);
if (retval != PAPI_OK) handle_error(retval);
// printf("After reading counters: %lld\n",values[0]);
/* Start the counters */
retval = PAPI_stop(EventSet, values);
if (retval != PAPI_OK){ handle_error(retval);}
//printf("After stopping counters: %lld\n",values[0]);
}
MPI_Finalize();
exit(0);
}
void handle_error (int retval)
{
printf("PAPI error %d: %s\n", retval, PAPI_strerror(retval));
exit(1);
}
An example of compiling the code and running it are:
[@blog2 test]$ module load gcc/12.2.0 papi/6.0.0.1 openmpi/4.1.4
[@blog2 test]$ salloc --account=<account> --time=1:00:00 --nodes=2 --ntasks-per-node=2
[@t514 test]$ mpicc -lpapi mpi_test.c -o mpi_test
[@t514 test]$ srun mpi_test
n = 50
pi is approximately 3.1416259869230037, Error is 0.0000333333332105
n = 100
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
n = 150
pi is approximately 3.1415963572934968, Error is 0.0000037037037037