Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
The Intel oneAPI ecosystm provides a number of compilers and libraries. This page contains contains information about what compilers are provided on Beartooth including some basic examples and command-line instructions on how to compile.
Compilers Available
Below is a table summarizing what compilers are provided by what modules:
Compiler
Description
Language
Module
Compiler
Description
Language
Module
icc
Intel® C++ Compiler Classic (icc) is deprecated and will be removed in a oneAPI release after the second half of 2023. Intel recommends that customers transition now to using the LLVM-based Intel® oneAPI DPC++(dpcpp)/C++ Compiler(icpx)/C Compiler(icx) for continued Windows* and Linux* support, new language support, new language features, and optimizations.
C/C++
icc
dpcpp
The Intel® oneAPI DPC++(dpcpp)/C++ Compiler(icpx)/C Compiler(icx) provides optimizations that help your applications run faster on Intel® 64 architectures on Windows* and Linux*, with support for the latest C, C++, and SYCL language standards. This compiler produces optimized code that can run significantly faster by taking advantage of the ever-increasing core count and vector register width in Intel® Xeon® processors and compatible processors. The Intel® Compiler will help you boost application performance through superior optimizations and Single Instruction Multiple Data (SIMD) vectorization, integration with Intel® Performance Libraries, and by leveraging the OpenMP* 5.0/5.1 parallel programming model.
The dpcpp, icx, and icpx are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case.
When the dpcpp is used with C source code the -fsycl flag will automatically be applied and the C source code will automatically be converted to C++ using SYCL. The conversion process is not perfect and can lead to errors.
The primary use case is for Data Parallel applications.
C/C++
compiler
icx
The dpcpp, icx, and icpx are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case.
The primary use case is for standard C applications.
C
compiler
icpx
The dpcpp, icx, and icpx are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case.
The primary use case is for standard C++ applications.
C++
compiler
ifx
The Intel® Fortran Compiler (ifx) enables developers needing OpenMP* offload to Intel GPUs. The OpenMP 5.0, 5.1 GPU offload features in ifx are not available in ifort.
Fortran
compiler
ifort
Intel® Fortran Compiler Classic (ifort) provides best-in-class Fortran language features and performance for CPU.
For calendar year 2022 ifort continues to be our best-in-class Fortran compiler for customers not needing GPU offload support
Fortran
compiler
mpicc
MPI C compiler that uses generic wrappers for the gcc complier.
C
mpi
mpicxx
MPI C++ compiler that uses generic wrappers for the g++ compiler.
C/C++
mpi
mpifc
MPI Fortran compiler that uses generic wrappers for the gfortran compiler.
Fortran
mpi
mpif90
MPI Fortran compiler that uses GNU wrappers for the gfortran compiler.
Fortran
mpi
mpif77
MPI Fortran compiler that uses GNU wrappers for the gfortran compiler.
Fortran
mpi
mpigcc
MPI C compiler that uses GNU wrappers for the gcc complier.
C
mpi
mpigxx
MPI C++ compiler that uses GNU wrappers for the g++ compiler.
C/C++
mpi
mpiicc
MPI C compiler that uses Intel wrappers for the icc compiler. The icc compiler is deprecated and will be removed in a oneAPI releases after the second half of 2023.
C
mpi
mpiicpc
MPI C++ compiler that uses Intel wrappers for the icpc compiler. The icpc compiler is deprecated and will be removed in a oneAPI releases after the second half of 2023.
C++
mpi
mpiifort
MPI Fortran compiler that uses Intel wrappers for the ifort compiler.
Fortran
mpi
Acronyms:
MKL: Math Kernel Library: a computing math library of highly optimized and extensively parallelized routines for applications that require maximum performance.
TBB: Threading Building Blocks: a widely used C++ library for task-based, shared memory parallel programming on the host.
MPI: Message Passing Interface: a multifabric message-passing library that implements the open source MPICH specification.
Best Practice: Because compiling code can be computationally intensive or may have long compilation times, we ask that large compilations be done either inside of a salloc session or sbatch job.
Examples:
MKL: C:
/*
Example from: https://www.intel.com/content/www/us/en/develop/documentation/mkl-tutorial-c/top/multiplying-matrices-using-dgemm.html#multiplying-matrices-using-dgemm
*/
#define min(x,y) (((x) < (y)) ? (x) : (y))
#include <stdio.h>
#include <stdlib.h>
#include "mkl.h"
int main()
{
double *A, *B, *C;
int m, n, k, i, j;
double alpha, beta;
printf ("\n This example computes real matrix C=alpha*A*B+beta*C using \n"
" Intel(R) MKL function dgemm, where A, B, and C are matrices and \n"
" alpha and beta are double precision scalars\n\n");
m = 2000, k = 200, n = 1000;
printf (" Initializing data for matrix multiplication C=A*B for matrix \n"
" A(%ix%i) and matrix B(%ix%i)\n\n", m, k, k, n);
alpha = 1.0; beta = 0.0;
printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n"
" performance \n\n");
A = (double *)mkl_malloc( m*k*sizeof( double ), 64 );
B = (double *)mkl_malloc( k*n*sizeof( double ), 64 );
C = (double *)mkl_malloc( m*n*sizeof( double ), 64 );
if (A == NULL || B == NULL || C == NULL) {
printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n");
mkl_free(A);
mkl_free(B);
mkl_free(C);
return 1;
}
printf (" Intializing matrix data \n\n");
for (i = 0; i < (m*k); i++) {
A[i] = (double)(i+1);
}
for (i = 0; i < (k*n); i++) {
B[i] = (double)(-i-1);
}
for (i = 0; i < (m*n); i++) {
C[i] = 0.0;
}
printf (" Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface \n\n");
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, A, k, B, n, beta, C, n);
printf ("\n Computations completed.\n\n");
printf (" Top left corner of matrix A: \n");
for (i=0; i<min(m,6); i++) {
for (j=0; j<min(k,6); j++) {
printf ("%12.0f", A[j+i*k]);
}
printf ("\n");
}
printf ("\n Top left corner of matrix B: \n");
for (i=0; i<min(k,6); i++) {
for (j=0; j<min(n,6); j++) {
printf ("%12.0f", B[j+i*n]);
}
printf ("\n");
}
printf ("\n Top left corner of matrix C: \n");
for (i=0; i<min(m,6); i++) {
for (j=0; j<min(n,6); j++) {
printf ("%12.5G", C[j+i*n]);
}
printf ("\n");
}
printf ("\n Deallocating memory \n\n");
mkl_free(A);
mkl_free(B);
mkl_free(C);
printf (" Example completed. \n\n");
return 0;
}
Since all of the MKL examples are built with the same source files the results of the executable will be the same:
[@blog2 data]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8
[@m001 data]$ ./mkl_test_dpcpp
This example computes real matrix C=alpha*A*B+beta*C using
Intel(R) MKL function dgemm, where A, B, and C are matrices and
alpha and beta are double precision scalars
Initializing data for matrix multiplication C=A*B for matrix
A(2000x200) and matrix B(200x1000)
Allocating memory for matrices aligned on 64-byte boundary for better
performance
Intializing matrix data
Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface
Computations completed.
Top left corner of matrix A:
1 2 3 4 5 6
201 202 203 204 205 206
401 402 403 404 405 406
601 602 603 604 605 606
801 802 803 804 805 806
1001 1002 1003 1004 1005 1006
Top left corner of matrix B:
-1 -2 -3 -4 -5 -6
-1001 -1002 -1003 -1004 -1005 -1006
-2001 -2002 -2003 -2004 -2005 -2006
-3001 -3002 -3003 -3004 -3005 -3006
-4001 -4002 -4003 -4004 -4005 -4006
-5001 -5002 -5003 -5004 -5005 -5006
Top left corner of matrix C:
-2.6666E+09 -2.6666E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09
-6.6467E+09 -6.6467E+09 -6.6468E+09 -6.6468E+09 -6.6469E+09 -6.647E+09
-1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10
-1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10
-1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8588E+10 -1.8588E+10
-2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2568E+10 -2.2568E+10
Deallocating memory
Example completed.
MKL: C++
/*
Example from: https://school.scientificprogramming.io/course/lesson/c-scientific-programming/13/327
*/
#include <stdio.h>
#include <stdlib.h>
#include "mkl.h"
#define min(x,y) (((x) < (y)) ? (x) : (y))
int main()
{
double *A, *B, *C;
int m, n, k, i, j;
double alpha, beta;
printf ("\n This example computes real matrix C=alpha*A*B+beta*C using \n"
" Intel(R) MKL function dgemm, where A, B, and C are matrices and \n"
" alpha and beta are double precision scalars\n\n");
m = 2000, k = 200, n = 1000;
printf (" Initializing data for matrix multiplication C=A*B for matrix \n"
" A(%ix%i) and matrix B(%ix%i)\n\n", m, k, k, n);
alpha = 1.0; beta = 0.0;
printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n"
" performance \n\n");
A = (double *)mkl_malloc( m*k*sizeof( double ), 64 );
B = (double *)mkl_malloc( k*n*sizeof( double ), 64 );
C = (double *)mkl_malloc( m*n*sizeof( double ), 64 );
if (A == NULL || B == NULL || C == NULL) {
printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n");
mkl_free(A);
mkl_free(B);
mkl_free(C);
return 1;
}
printf (" Intializing matrix data \n\n");
for (i = 0; i < (m*k); i++) {
A[i] = (double)(i+1);
}
for (i = 0; i < (k*n); i++) {
B[i] = (double)(-i-1);
}
for (i = 0; i < (m*n); i++) {
C[i] = 0.0;
}
printf (" Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface \n\n");
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, A, k, B, n, beta, C, n);
printf ("\n Computations completed.\n\n");
printf (" Top left corner of matrix A: \n");
for (i=0; i<min(m,6); i++) {
for (j=0; j<min(k,6); j++) {
printf ("%12.0f", A[j+i*k]);
}
printf ("\n");
}
printf ("\n Top left corner of matrix B: \n");
for (i=0; i<min(k,6); i++) {
for (j=0; j<min(n,6); j++) {
printf ("%12.0f", B[j+i*n]);
}
printf ("\n");
}
printf ("\n Top left corner of matrix C: \n");
for (i=0; i<min(m,6); i++) {
for (j=0; j<min(n,6); j++) {
printf ("%12.5G", C[j+i*n]);
}
printf ("\n");
}
printf ("\n Deallocating memory \n\n");
mkl_free(A);
mkl_free(B);
mkl_free(C);
printf (" Example completed. \n\n");
return 0;
}
Since all of the C++ MKL examples are built with the same source files the results of the executable will be the same:
[@blog2 test]$ salloc --account=arcc --time=30:00
[@m003 test]$ ./mkl_test_cpp_dpcpp
This example computes real matrix C=alpha*A*B+beta*C using
Intel(R) MKL function dgemm, where A, B, and C are matrices and
alpha and beta are double precision scalars
Initializing data for matrix multiplication C=A*B for matrix
A(2000x200) and matrix B(200x1000)
Allocating memory for matrices aligned on 64-byte boundary for better
performance
Intializing matrix data
Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface
Computations completed.
Top left corner of matrix A:
1 2 3 4 5 6
201 202 203 204 205 206
401 402 403 404 405 406
601 602 603 604 605 606
801 802 803 804 805 806
1001 1002 1003 1004 1005 1006
Top left corner of matrix B:
-1 -2 -3 -4 -5 -6
-1001 -1002 -1003 -1004 -1005 -1006
-2001 -2002 -2003 -2004 -2005 -2006
-3001 -3002 -3003 -3004 -3005 -3006
-4001 -4002 -4003 -4004 -4005 -4006
-5001 -5002 -5003 -5004 -5005 -5006
Top left corner of matrix C:
-2.6666E+09 -2.6666E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09
-6.6467E+09 -6.6467E+09 -6.6468E+09 -6.6468E+09 -6.6469E+09 -6.647E+09
-1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10
-1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10
-1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8588E+10 -1.8588E+10
-2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2568E+10 -2.2568E+10
Deallocating memory
Example completed.
[@m003 test]$
MKL: Fortran
* Fortran source code is from: https://www.intel.com/content/www/us/en/develop/documentation/mkl-tutorial-fortran/top/multiplying-matrices-using-dgemm.html#multiplying-matrices-using-dgemm
* Fortran source code is found in mkl_test.f
PROGRAM MAIN
IMPLICIT NONE
DOUBLE PRECISION ALPHA, BETA
INTEGER M, K, N, I, J
PARAMETER (M=2000, K=200, N=1000)
DOUBLE PRECISION A(M,K), B(K,N), C(M,N)
PRINT *, "This example computes real matrix C=alpha*A*B+beta*C"
PRINT *, "using Intel(R) MKL function dgemm, where A, B, and C"
PRINT *, "are matrices and alpha and beta are double precision "
PRINT *, "scalars"
PRINT *, ""
PRINT *, "Initializing data for matrix multiplication C=A*B for "
PRINT 10, " matrix A(",M," x",K, ") and matrix B(", K," x", N, ")"
10 FORMAT(a,I5,a,I5,a,I5,a,I5,a)
PRINT *, ""
ALPHA = 1.0
BETA = 0.0
PRINT *, "Intializing matrix data"
PRINT *, ""
DO I = 1, M
DO J = 1, K
A(I,J) = (I-1) * K + J
END DO
END DO
DO I = 1, K
DO J = 1, N
B(I,J) = -((I-1) * N + J)
END DO
END DO
DO I = 1, M
DO J = 1, N
C(I,J) = 0.0
END DO
END DO
PRINT *, "Computing matrix product using Intel(R) MKL DGEMM "
PRINT *, "subroutine"
CALL DGEMM('N','N',M,N,K,ALPHA,A,M,B,K,BETA,C,M)
PRINT *, "Computations completed."
PRINT *, ""
PRINT *, "Top left corner of matrix A:"
PRINT 20, ((A(I,J), J = 1,MIN(K,6)), I = 1,MIN(M,6))
PRINT *, ""
PRINT *, "Top left corner of matrix B:"
PRINT 20, ((B(I,J),J = 1,MIN(N,6)), I = 1,MIN(K,6))
PRINT *, ""
20 FORMAT(6(F12.0,1x))
PRINT *, "Top left corner of matrix C:"
PRINT 30, ((C(I,J), J = 1,MIN(N,6)), I = 1,MIN(M,6))
PRINT *, ""
30 FORMAT(6(ES12.4,1x))
PRINT *, "Example completed."
STOP
END
Since all of the Fortran MKL examples are built with the same source files the results of the executable will be the same:
[@blog2 data]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8
[@m001 test]$ ./mkl_test_ifx
This example computes real matrix C=alpha*A*B+beta*C
using Intel(R) MKL function dgemm, where A, B, and C
are matrices and alpha and beta are double precision
scalars
Initializing data for matrix multiplication C=A*B for
matrix A( 2000 x 200) and matrix B( 200 x 1000)
Intializing matrix data
Computing matrix product using Intel(R) MKL DGEMM
subroutine
Computations completed.
Top left corner of matrix A:
1. 2. 3. 4. 5. 6.
201. 202. 203. 204. 205. 206.
401. 402. 403. 404. 405. 406.
601. 602. 603. 604. 605. 606.
801. 802. 803. 804. 805. 806.
1001. 1002. 1003. 1004. 1005. 1006.
Top left corner of matrix B:
-1. -2. -3. -4. -5. -6.
-1001. -1002. -1003. -1004. -1005. -1006.
-2001. -2002. -2003. -2004. -2005. -2006.
-3001. -3002. -3003. -3004. -3005. -3006.
-4001. -4002. -4003. -4004. -4005. -4006.
-5001. -5002. -5003. -5004. -5005. -5006.
Top left corner of matrix C:
-2.6666E+09 -2.6666E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09
-6.6467E+09 -6.6467E+09 -6.6468E+09 -6.6468E+09 -6.6469E+09 -6.6470E+09
-1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10
-1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10
-1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8588E+10 -1.8588E+10
-2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2568E+10 -2.2568E+10
Example completed.
TBB: C++
The module tbb/2021.7.0 is automatically loaded when loading compiler/2022.2.0 or mkl/2022.2.0 for this reason it is omitted from the module loads used in the examples.
/*
Based on: https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-onetbb/top.html
Added sleep in code to add enough of a delay to verify that it is using multiple cores. Solution came from: https://stackoverflow.com/questions/158585/how-do-you-add-a-timed-delay-to-a-c-program
*/
#include <iostream>
#include "tbb.h"
#include <unistd.h>
int main()
{
int sum = oneapi::tbb::parallel_reduce(oneapi::tbb::blocked_range<int>(1,101), 0,
[](oneapi::tbb::blocked_range<int> const& r, int init) -> int {
for (int v = r.begin(); v != r.end(); v++ ) {
init += v;
sleep(3);
}
return init;
},
[](int lhs, int rhs) -> int {
return lhs + rhs;
}
);
std::cout << "Sum of all ints from 1 to 100: " << sum << std::endl;
return 0;
}
Since all of the C++ examples are built with the same source files the results of the executable will be the same: The executable will also work across multiple CPUs so experiment with the number used in the salloc.
[@blog2 data]$ salloc --account=arcc --time=30:00 --cpus-per-tasks=8
[@m001 data]$ ./tbb_test
Sum of all ints from 1 to 100: 5050
MPI in oneAPI Ecosystem
Intel’s oneAPI MPI compilers use their own copy of mpirun which uses a built in process manager that pulls the relevant information from SLURM to run code compiled with the oneAPI compilers.
Messages Appearing During Code Execution
Because the process manager used by oneAPI MPI compilers automatically pulls information from SLURM it will warn the user that it is ignoring certain environmental variables:
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
[0] MPI startup(): I_MPI_EXTRA_FILESYSTEM_LIST environment variable is not supported.
[0] MPI startup(): Similar variables:
I_MPI_EXTRA_FILESYSTEM
I_MPI_EXTRA_FILESYSTEM_FORCE
To suppress these errors in either the sbatch script or salloc session enter the following:
Sample Code: The sample code used in this webpage is available on Beartooth in this location: /apps/u/opt/compilers/oneapi/2022.3/mpi/2021.7.0/test
MPI: C
/*
Copyright Intel Corporation.
This software and the related documents are Intel copyrighted materials, and
your use of them is governed by the express license under which they were
provided to you (License). Unless the License provides otherwise, you may
not use, modify, copy, publish, distribute, disclose or transmit this
software or the related documents without Intel's prior written permission.
This software and the related documents are provided as is, with no express
or implied warranties, other than those that are expressly stated in the
License.
*/
#include "mpi.h"
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
int i, rank, size, namelen;
char name[MPI_MAX_PROCESSOR_NAME];
MPI_Status stat;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(name, &namelen);
if (rank == 0) {
printf("Hello world: rank %d of %d running on %s\n", rank, size, name);
for (i = 1; i < size; i++) {
MPI_Recv(&rank, 1, MPI_INT, i, 1, MPI_COMM_WORLD, &stat);
MPI_Recv(&size, 1, MPI_INT, i, 1, MPI_COMM_WORLD, &stat);
MPI_Recv(&namelen, 1, MPI_INT, i, 1, MPI_COMM_WORLD, &stat);
MPI_Recv(name, namelen + 1, MPI_CHAR, i, 1, MPI_COMM_WORLD, &stat);
printf("Hello world: rank %d of %d running on %s\n", rank, size, name);
}
} else {
MPI_Send(&rank, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
MPI_Send(&size, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
MPI_Send(&namelen, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
MPI_Send(name, namelen + 1, MPI_CHAR, 0, 1, MPI_COMM_WORLD);
}
MPI_Finalize();
return (0);
}
Since all of the C examples are built with the same source files the results of the executable will be the same:
[@blog2 mpi_test]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8 --nodes=2
[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0
[@m001 mpi_test]$ unset I_MPI_EXTRA_FILESYSTEM_LIST
[@m001 mpi_test]$ unset I_MPI_PMI_LIBRARY
[@m001 mpi_test]$ mpirun ./mpicc_test_c
Hello world: rank 0 of 16 running on m001
Hello world: rank 1 of 16 running on m001
Hello world: rank 2 of 16 running on m001
Hello world: rank 3 of 16 running on m001
Hello world: rank 4 of 16 running on m001
Hello world: rank 5 of 16 running on m001
Hello world: rank 6 of 16 running on m001
Hello world: rank 7 of 16 running on m001
Hello world: rank 8 of 16 running on m002
Hello world: rank 9 of 16 running on m002
Hello world: rank 10 of 16 running on m002
Hello world: rank 11 of 16 running on m002
Hello world: rank 12 of 16 running on m002
Hello world: rank 13 of 16 running on m002
Hello world: rank 14 of 16 running on m002
Hello world: rank 15 of 16 running on m002
MPI: C++
/*
Copyright Intel Corporation.
This software and the related documents are Intel copyrighted materials, and
your use of them is governed by the express license under which they were
provided to you (License). Unless the License provides otherwise, you may
not use, modify, copy, publish, distribute, disclose or transmit this
software or the related documents without Intel's prior written permission.
This software and the related documents are provided as is, with no express
or implied warranties, other than those that are expressly stated in the
License.
*/
#include "mpi.h"
#include <iostream>
int main(int argc, char *argv[])
{
int i, rank, size, namelen;
char name[MPI_MAX_PROCESSOR_NAME];
MPI::Status stat;
MPI::Init(argc, argv);
size = MPI::COMM_WORLD.Get_size();
rank = MPI::COMM_WORLD.Get_rank();
MPI::Get_processor_name(name, namelen);
if (rank == 0) {
std::cout << "Hello world: rank " << rank << " of " << size << " running on " << name <<
"\n";
for (i = 1; i < size; i++) {
MPI::COMM_WORLD.Recv(&rank, 1, MPI_INT, i, 1, stat);
MPI::COMM_WORLD.Recv(&size, 1, MPI_INT, i, 1, stat);
MPI::COMM_WORLD.Recv(&namelen, 1, MPI_INT, i, 1, stat);
MPI::COMM_WORLD.Recv(name, namelen + 1, MPI_CHAR, i, 1, stat);
std::cout << "Hello world: rank " << rank << " of " << size << " running on " << name <<
"\n";
}
} else {
MPI::COMM_WORLD.Send(&rank, 1, MPI_INT, 0, 1);
MPI::COMM_WORLD.Send(&size, 1, MPI_INT, 0, 1);
MPI::COMM_WORLD.Send(&namelen, 1, MPI_INT, 0, 1);
MPI::COMM_WORLD.Send(name, namelen + 1, MPI_CHAR, 0, 1);
}
MPI::Finalize();
return (0);
}
Since all of the C++ examples are built with the same source files the results of the executable will be the same:
[@blog2 mpi_test]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8 --nodes=2
[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0
[@m001 mpi_test]$ unset I_MPI_EXTRA_FILESYSTEM_LIST
[@m001 mpi_test]$ unset I_MPI_PMI_LIBRARY
[@m001 mpi_test]$ mpirun ./mpicxx_test_cpp
Hello world: rank 0 of 16 running on m001
Hello world: rank 1 of 16 running on m001
Hello world: rank 2 of 16 running on m001
Hello world: rank 3 of 16 running on m001
Hello world: rank 4 of 16 running on m001
Hello world: rank 5 of 16 running on m001
Hello world: rank 6 of 16 running on m001
Hello world: rank 7 of 16 running on m001
Hello world: rank 8 of 16 running on m002
Hello world: rank 9 of 16 running on m002
Hello world: rank 10 of 16 running on m002
Hello world: rank 11 of 16 running on m002
Hello world: rank 12 of 16 running on m002
Hello world: rank 13 of 16 running on m002
Hello world: rank 14 of 16 running on m002
Hello world: rank 15 of 16 running on m002
MPI: Fortran
!
! Copyright Intel Corporation.
!
! This software and the related documents are Intel copyrighted materials, and
! your use of them is governed by the express license under which they were
! provided to you (License). Unless the License provides otherwise, you may
! not use, modify, copy, publish, distribute, disclose or transmit this
! software or the related documents without Intel's prior written permission.
!
! This software and the related documents are provided as is, with no express
! or implied warranties, other than those that are expressly stated in the
! License.
!
program main
implicit none
include 'mpif.h'
integer i, size, rank, namelen, ierr
character*(MPI_MAX_PROCESSOR_NAME) name
integer stat(MPI_STATUS_SIZE)
call MPI_INIT (ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD, size, ierr)
call MPI_COMM_RANK (MPI_COMM_WORLD, rank, ierr)
call MPI_GET_PROCESSOR_NAME (name, namelen, ierr)
if (rank.eq.0) then
print *, 'Hello world: rank ', rank, ' of ', size,
x ' running on ', name
do i = 1, size - 1
call MPI_RECV (rank, 1, MPI_INTEGER, i, 1,
x MPI_COMM_WORLD, stat, ierr)
call MPI_RECV (size, 1, MPI_INTEGER, i, 1,
x MPI_COMM_WORLD, stat, ierr)
call MPI_RECV (namelen, 1, MPI_INTEGER, i, 1,
x MPI_COMM_WORLD, stat, ierr)
name = ''
call MPI_RECV (name, namelen, MPI_CHARACTER, i, 1,
x MPI_COMM_WORLD, stat, ierr)
print *, 'Hello world: rank ', rank, ' of ', size,
x ' running on ', name
enddo
else
call MPI_SEND (rank, 1, MPI_INTEGER, 0, 1,
x MPI_COMM_WORLD, ierr)
call MPI_SEND (size, 1, MPI_INTEGER, 0, 1,
x MPI_COMM_WORLD, ierr)
call MPI_SEND (namelen, 1, MPI_INTEGER, 0, 1,
x MPI_COMM_WORLD, ierr)
call MPI_SEND (name, namelen, MPI_CHARACTER, 0, 1,
x MPI_COMM_WORLD, ierr)
endif
call MPI_FINALIZE (ierr)
end