PAPI

1 Overview | 2 Using | 2.1 Multicore | 2.1.1 MPI Example

Overview

  • Performance Application Programming Interface (PAPI) provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.

    In addition, PAPI provides access to a collection of components that expose performance measurement opportunities across the hardware and software stack.

Using

Use the module name papi to discover versions available and to load the application.

To get a list of the components installed with a version of papi please use the papi_component_avail command. An example of calling the command is as follows:

[@blog2 ~]$ papi_component_avail Available components and hardware information. -------------------------------------------------------------------------------- PAPI version : 6.0.0.1 Operating system : Linux 4.18.0-372.32.1.el8_6.x86_64 Vendor string and code : GenuineIntel (1, 0x1) Model string and code : Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz (106, 0x6a) CPU revision : 6.000000 CPUID : Family/Model/Stepping 6/106/6, 0x06/0x6a/0x06 CPU Max MHz : 3500 CPU Min MHz : 800 Total cores : 96 SMT threads per core : 2 Cores per socket : 24 Sockets : 2 Cores per NUMA region : 48 NUMA regions : 2 Running in a VM : no Number Hardware Counters : 19 Max Multiplex Counters : 384 Fast counter read (rdpmc): yes -------------------------------------------------------------------------------- Compiled-in components: Name: perf_event Linux perf_event CPU counters Name: perf_event_uncore Linux perf_event CPU uncore and northbridge \-> Disabled: No uncore PMUs or events found Name: example A simple example component Name: infiniband Linux Infiniband statistics using the sysfs interface Active components: Name: perf_event Linux perf_event CPU counters Native: 154, Preset: 0, Counters: 19 PMUs supported: ix86arch, perf, perf_raw, icl Name: example A simple example component Native: 4, Preset: 0, Counters: 3 Name: infiniband Linux Infiniband statistics using the sysfs interface Native: 96, Preset: 0, Counters: 96 --------------------------------------------------------------------------------

Multicore

The papi library can be used to develop serial, multithreaded, and multinode applications. Please refer to the PAPI MPI documentation for more information about using this functionality.

MPI Example

Below is an example of code using both papi and openmpi:

/* Code based on: https://bitbucket.org/icl/papi/wiki/PAPI-Parallel-Programs */ #include <papi.h> #include "mpi.h" #include <math.h> #include <stdio.h> #include <stdlib.h> void handle_error (int retval); int main(int argc, char *argv[] ) { { int done = 0, n, myid, numprocs, i, rc, retval, EventSet = PAPI_NULL; double PI25DT = 3.141592653589793238462643; double mypi, pi, h, sum, x, a; long_long values[1] = {(long_long) 0}; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* Initialize the PAPI library */ retval = PAPI_library_init(PAPI_VER_CURRENT); if (retval != PAPI_VER_CURRENT) {handle_error(retval); } /* Create an EventSet */ retval = PAPI_create_eventset(&EventSet); if (retval != PAPI_OK){ handle_error(retval);} /* Add Total Instructions Executed to our EventSet */ retval = PAPI_add_event(EventSet, PAPI_TOT_INS); if (retval != PAPI_OK){ handle_error(retval);} /* Start counting */ retval = PAPI_start(EventSet); if (retval != PAPI_OK){ handle_error(retval);} n = 50; for(int n = 50; n <= 150; n += 50) { if (myid == 0) { printf("n = %i\n",n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD); if (myid == 0){ printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT)); } } /* Read the counters */ retval = PAPI_read(EventSet, values); if (retval != PAPI_OK) handle_error(retval); // printf("After reading counters: %lld\n",values[0]); /* Start the counters */ retval = PAPI_stop(EventSet, values); if (retval != PAPI_OK){ handle_error(retval);} //printf("After stopping counters: %lld\n",values[0]); } MPI_Finalize(); exit(0); } void handle_error (int retval) { printf("PAPI error %d: %s\n", retval, PAPI_strerror(retval)); exit(1); }

An example of compiling the code and running it are:

[@blog2 test]$ module load gcc/12.2.0 papi/6.0.0.1 openmpi/4.1.4 [@blog2 test]$ salloc --account=<account> --time=1:00:00 --nodes=2 --ntasks-per-node=2 [@t514 test]$ mpicc -lpapi mpi_test.c -o mpi_test [@t514 test]$ srun mpi_test n = 50 pi is approximately 3.1416259869230037, Error is 0.0000333333332105 n = 100 pi is approximately 3.1416009869231249, Error is 0.0000083333333318 n = 150 pi is approximately 3.1415963572934968, Error is 0.0000037037037037