/
oneAPI: Compiling [beartooth]

oneAPI: Compiling [beartooth]

Overview

The Intel oneAPI ecosystm provides a number of compilers and libraries. This page contains contains information about what compilers are provided on Beartooth including some basic examples and command-line instructions on how to compile.

Compilers Available

Below is a table summarizing what compilers are provided by what modules:

Compiler

Description

Language

Module

Compiler

Description

Language

Module

icc

Intel® C++ Compiler Classic (icc) is deprecated and will be removed in a oneAPI release after the second half of 2023. Intel recommends that customers transition now to using the LLVM-based Intel® oneAPI DPC++(dpcpp)/C++ Compiler(icpx)/C Compiler(icx) for continued Windows* and Linux* support, new language support, new language features, and optimizations.

C/C++

icc

dpcpp

The Intel® oneAPI DPC++(dpcpp)/C++ Compiler(icpx)/C Compiler(icx) provides optimizations that help your applications run faster on Intel® 64 architectures on Windows* and Linux*, with support for the latest C, C++, and SYCL language standards. This compiler produces optimized code that can run significantly faster by taking advantage of the ever-increasing core count and vector register width in Intel® Xeon® processors and compatible processors. The Intel® Compiler will help you boost application performance through superior optimizations and Single Instruction Multiple Data (SIMD) vectorization, integration with Intel® Performance Libraries, and by leveraging the OpenMP* 5.0/5.1 parallel programming model.

The dpcpp, icx, and icpx are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case.

When the dpcpp is used with C source code the -fsycl flag will automatically be applied and the C source code will automatically be converted to C++ using SYCL. The conversion process is not perfect and can lead to errors.

The primary use case is for Data Parallel applications.

C/C++

compiler

icx

The dpcpp, icx, and icpx are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case.

The primary use case is for standard C applications.

C

compiler

icpx

The dpcpp, icx, and icpx are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case.

The primary use case is for standard C++ applications.

C++

compiler

ifx

The Intel® Fortran Compiler (ifx) enables developers needing OpenMP* offload to Intel GPUs. The OpenMP 5.0, 5.1 GPU offload features in ifx are not available in ifort.

Fortran

compiler

ifort

Intel® Fortran Compiler Classic (ifort) provides best-in-class Fortran language features and performance for CPU.

For calendar year 2022 ifort continues to be our best-in-class Fortran compiler for customers not needing GPU offload support

Fortran

compiler

mpicc

MPI C compiler that uses generic wrappers for the gcc complier.

C

mpi

mpicxx

MPI C++ compiler that uses generic wrappers for the g++ compiler.

C/C++

mpi

mpifc

MPI Fortran compiler that uses generic wrappers for the gfortran compiler.

Fortran

mpi

mpif90

MPI Fortran compiler that uses GNU wrappers for the gfortran compiler.

Fortran

mpi

mpif77

MPI Fortran compiler that uses GNU wrappers for the gfortran compiler.

Fortran

mpi

mpigcc

MPI C compiler that uses GNU wrappers for the gcc complier.

C

mpi

mpigxx

MPI C++ compiler that uses GNU wrappers for the g++ compiler.

C/C++

mpi

mpiicc

MPI C compiler that uses Intel wrappers for the icc compiler. The icc compiler is deprecated and will be removed in a oneAPI releases after the second half of 2023.

C

mpi

mpiicpc

MPI C++ compiler that uses Intel wrappers for the icpc compiler. The icpc compiler is deprecated and will be removed in a oneAPI releases after the second half of 2023.

C++

mpi

mpiifort

MPI Fortran compiler that uses Intel wrappers for the ifort compiler.

Fortran

mpi

Acronyms:

  • MKL: Math Kernel Library: a computing math library of highly optimized and extensively parallelized routines for applications that require maximum performance.

  • TBB: Threading Building Blocks: a widely used C++ library for task-based, shared memory parallel programming on the host.

  • MPI: Message Passing Interface: a multifabric message-passing library that implements the open source MPICH specification.

Best Practice: Because compiling code can be computationally intensive or may have long compilation times, we ask that large compilations be done either inside of a salloc session or sbatch job.

Examples:

MKL: C:

/* Example from: https://www.intel.com/content/www/us/en/develop/documentation/mkl-tutorial-c/top/multiplying-matrices-using-dgemm.html#multiplying-matrices-using-dgemm */ #define min(x,y) (((x) < (y)) ? (x) : (y)) #include <stdio.h> #include <stdlib.h> #include "mkl.h" int main() { double *A, *B, *C; int m, n, k, i, j; double alpha, beta; printf ("\n This example computes real matrix C=alpha*A*B+beta*C using \n" " Intel(R) MKL function dgemm, where A, B, and C are matrices and \n" " alpha and beta are double precision scalars\n\n"); m = 2000, k = 200, n = 1000; printf (" Initializing data for matrix multiplication C=A*B for matrix \n" " A(%ix%i) and matrix B(%ix%i)\n\n", m, k, k, n); alpha = 1.0; beta = 0.0; printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n" " performance \n\n"); A = (double *)mkl_malloc( m*k*sizeof( double ), 64 ); B = (double *)mkl_malloc( k*n*sizeof( double ), 64 ); C = (double *)mkl_malloc( m*n*sizeof( double ), 64 ); if (A == NULL || B == NULL || C == NULL) { printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n"); mkl_free(A); mkl_free(B); mkl_free(C); return 1; } printf (" Intializing matrix data \n\n"); for (i = 0; i < (m*k); i++) { A[i] = (double)(i+1); } for (i = 0; i < (k*n); i++) { B[i] = (double)(-i-1); } for (i = 0; i < (m*n); i++) { C[i] = 0.0; } printf (" Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface \n\n"); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, k, B, n, beta, C, n); printf ("\n Computations completed.\n\n"); printf (" Top left corner of matrix A: \n"); for (i=0; i<min(m,6); i++) { for (j=0; j<min(k,6); j++) { printf ("%12.0f", A[j+i*k]); } printf ("\n"); } printf ("\n Top left corner of matrix B: \n"); for (i=0; i<min(k,6); i++) { for (j=0; j<min(n,6); j++) { printf ("%12.0f", B[j+i*n]); } printf ("\n"); } printf ("\n Top left corner of matrix C: \n"); for (i=0; i<min(m,6); i++) { for (j=0; j<min(n,6); j++) { printf ("%12.5G", C[j+i*n]); } printf ("\n"); } printf ("\n Deallocating memory \n\n"); mkl_free(A); mkl_free(B); mkl_free(C); printf (" Example completed. \n\n"); return 0; }

Using icc compiler

[@m001 test]$ module load oneapi/2022.3 icc/2022.2.0 mkl/2022.2.0 [@m001 test]$ icc -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -o mkl_test_icc mkl_test.c

Using icx compiler

[@m001 data]$ module load oneapi/2022.3 compiler/2022.2.0 mkl/2022.2.0 [@m001 test]$ icx -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -o mkl_test_icx mkl_test.c

Output

Since all of the MKL examples are built with the same source files the results of the executable will be the same:

[@blog2 data]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8 [@m001 data]$ ./mkl_test_dpcpp This example computes real matrix C=alpha*A*B+beta*C using Intel(R) MKL function dgemm, where A, B, and C are matrices and alpha and beta are double precision scalars Initializing data for matrix multiplication C=A*B for matrix A(2000x200) and matrix B(200x1000) Allocating memory for matrices aligned on 64-byte boundary for better performance Intializing matrix data Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface Computations completed. Top left corner of matrix A: 1 2 3 4 5 6 201 202 203 204 205 206 401 402 403 404 405 406 601 602 603 604 605 606 801 802 803 804 805 806 1001 1002 1003 1004 1005 1006 Top left corner of matrix B: -1 -2 -3 -4 -5 -6 -1001 -1002 -1003 -1004 -1005 -1006 -2001 -2002 -2003 -2004 -2005 -2006 -3001 -3002 -3003 -3004 -3005 -3006 -4001 -4002 -4003 -4004 -4005 -4006 -5001 -5002 -5003 -5004 -5005 -5006 Top left corner of matrix C: -2.6666E+09 -2.6666E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -6.6467E+09 -6.6467E+09 -6.6468E+09 -6.6468E+09 -6.6469E+09 -6.647E+09 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8588E+10 -1.8588E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2568E+10 -2.2568E+10 Deallocating memory Example completed.

MKL: C++

/* Example from: https://school.scientificprogramming.io/course/lesson/c-scientific-programming/13/327 */ #include <stdio.h> #include <stdlib.h> #include "mkl.h" #define min(x,y) (((x) < (y)) ? (x) : (y)) int main() { double *A, *B, *C; int m, n, k, i, j; double alpha, beta; printf ("\n This example computes real matrix C=alpha*A*B+beta*C using \n" " Intel(R) MKL function dgemm, where A, B, and C are matrices and \n" " alpha and beta are double precision scalars\n\n"); m = 2000, k = 200, n = 1000; printf (" Initializing data for matrix multiplication C=A*B for matrix \n" " A(%ix%i) and matrix B(%ix%i)\n\n", m, k, k, n); alpha = 1.0; beta = 0.0; printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n" " performance \n\n"); A = (double *)mkl_malloc( m*k*sizeof( double ), 64 ); B = (double *)mkl_malloc( k*n*sizeof( double ), 64 ); C = (double *)mkl_malloc( m*n*sizeof( double ), 64 ); if (A == NULL || B == NULL || C == NULL) { printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n"); mkl_free(A); mkl_free(B); mkl_free(C); return 1; } printf (" Intializing matrix data \n\n"); for (i = 0; i < (m*k); i++) { A[i] = (double)(i+1); } for (i = 0; i < (k*n); i++) { B[i] = (double)(-i-1); } for (i = 0; i < (m*n); i++) { C[i] = 0.0; } printf (" Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface \n\n"); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, k, B, n, beta, C, n); printf ("\n Computations completed.\n\n"); printf (" Top left corner of matrix A: \n"); for (i=0; i<min(m,6); i++) { for (j=0; j<min(k,6); j++) { printf ("%12.0f", A[j+i*k]); } printf ("\n"); } printf ("\n Top left corner of matrix B: \n"); for (i=0; i<min(k,6); i++) { for (j=0; j<min(n,6); j++) { printf ("%12.0f", B[j+i*n]); } printf ("\n"); } printf ("\n Top left corner of matrix C: \n"); for (i=0; i<min(m,6); i++) { for (j=0; j<min(n,6); j++) { printf ("%12.5G", C[j+i*n]); } printf ("\n"); } printf ("\n Deallocating memory \n\n"); mkl_free(A); mkl_free(B); mkl_free(C); printf (" Example completed. \n\n"); return 0; }

Using dpcpp compiler

[@m003 test]$ module load oneapi/2022.3 compiler/2022.2.0 mkl/2022.2.0 [@m003 test]$ dpcpp -DMKL_ILP64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -o mkl_test_cpp_dpcpp mkl_test.cpp

Using icc compiler

[@m003 test]$ module load oneapi/2022.3 icc/2022.2.0 mkl/2022.2.0 [@m003 test]$ icc -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -o mkl_test_cpp_icc mkl_test.cpp

Using icpx compiler

[@m003 test]$ module load oneapi/2022.3 compiler/2022.2.0 mkl/2022.2.0 [@m003 test]$ icpx -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -o mkl_test_cpp_icpx mkl_test.cpp

Output

Since all of the C++ MKL examples are built with the same source files the results of the executable will be the same:

[@blog2 test]$ salloc --account=arcc --time=30:00 [@m003 test]$ ./mkl_test_cpp_dpcpp This example computes real matrix C=alpha*A*B+beta*C using Intel(R) MKL function dgemm, where A, B, and C are matrices and alpha and beta are double precision scalars Initializing data for matrix multiplication C=A*B for matrix A(2000x200) and matrix B(200x1000) Allocating memory for matrices aligned on 64-byte boundary for better performance Intializing matrix data Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface Computations completed. Top left corner of matrix A: 1 2 3 4 5 6 201 202 203 204 205 206 401 402 403 404 405 406 601 602 603 604 605 606 801 802 803 804 805 806 1001 1002 1003 1004 1005 1006 Top left corner of matrix B: -1 -2 -3 -4 -5 -6 -1001 -1002 -1003 -1004 -1005 -1006 -2001 -2002 -2003 -2004 -2005 -2006 -3001 -3002 -3003 -3004 -3005 -3006 -4001 -4002 -4003 -4004 -4005 -4006 -5001 -5002 -5003 -5004 -5005 -5006 Top left corner of matrix C: -2.6666E+09 -2.6666E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -6.6467E+09 -6.6467E+09 -6.6468E+09 -6.6468E+09 -6.6469E+09 -6.647E+09 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8588E+10 -1.8588E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2568E+10 -2.2568E+10 Deallocating memory Example completed. [@m003 test]$

MKL: Fortran

* Fortran source code is from: https://www.intel.com/content/www/us/en/develop/documentation/mkl-tutorial-fortran/top/multiplying-matrices-using-dgemm.html#multiplying-matrices-using-dgemm * Fortran source code is found in mkl_test.f PROGRAM MAIN IMPLICIT NONE DOUBLE PRECISION ALPHA, BETA INTEGER M, K, N, I, J PARAMETER (M=2000, K=200, N=1000) DOUBLE PRECISION A(M,K), B(K,N), C(M,N) PRINT *, "This example computes real matrix C=alpha*A*B+beta*C" PRINT *, "using Intel(R) MKL function dgemm, where A, B, and C" PRINT *, "are matrices and alpha and beta are double precision " PRINT *, "scalars" PRINT *, "" PRINT *, "Initializing data for matrix multiplication C=A*B for " PRINT 10, " matrix A(",M," x",K, ") and matrix B(", K," x", N, ")" 10 FORMAT(a,I5,a,I5,a,I5,a,I5,a) PRINT *, "" ALPHA = 1.0 BETA = 0.0 PRINT *, "Intializing matrix data" PRINT *, "" DO I = 1, M DO J = 1, K A(I,J) = (I-1) * K + J END DO END DO DO I = 1, K DO J = 1, N B(I,J) = -((I-1) * N + J) END DO END DO DO I = 1, M DO J = 1, N C(I,J) = 0.0 END DO END DO PRINT *, "Computing matrix product using Intel(R) MKL DGEMM " PRINT *, "subroutine" CALL DGEMM('N','N',M,N,K,ALPHA,A,M,B,K,BETA,C,M) PRINT *, "Computations completed." PRINT *, "" PRINT *, "Top left corner of matrix A:" PRINT 20, ((A(I,J), J = 1,MIN(K,6)), I = 1,MIN(M,6)) PRINT *, "" PRINT *, "Top left corner of matrix B:" PRINT 20, ((B(I,J),J = 1,MIN(N,6)), I = 1,MIN(K,6)) PRINT *, "" 20 FORMAT(6(F12.0,1x)) PRINT *, "Top left corner of matrix C:" PRINT 30, ((C(I,J), J = 1,MIN(N,6)), I = 1,MIN(M,6)) PRINT *, "" 30 FORMAT(6(ES12.4,1x)) PRINT *, "Example completed." STOP END

Using ifx compiler

[@m001 data]$ module load oneapi/2022.3 compiler/2022.2.0 mkl/2022.2.0 [@m001 test]$ ifx -mkl mkl_test.f -o mkl_test_ifx

Using ifort compiler

[@m001 data]$ module load oneapi/2022.3 compiler/2022.2.0 mkl/2022.2.0 [@m001 test]$ ifort -mkl mkl_test.f -o mkl_test_ifort

Output

Since all of the Fortran MKL examples are built with the same source files the results of the executable will be the same:

[@blog2 data]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8 [@m001 test]$ ./mkl_test_ifx This example computes real matrix C=alpha*A*B+beta*C using Intel(R) MKL function dgemm, where A, B, and C are matrices and alpha and beta are double precision scalars Initializing data for matrix multiplication C=A*B for matrix A( 2000 x 200) and matrix B( 200 x 1000) Intializing matrix data Computing matrix product using Intel(R) MKL DGEMM subroutine Computations completed. Top left corner of matrix A: 1. 2. 3. 4. 5. 6. 201. 202. 203. 204. 205. 206. 401. 402. 403. 404. 405. 406. 601. 602. 603. 604. 605. 606. 801. 802. 803. 804. 805. 806. 1001. 1002. 1003. 1004. 1005. 1006. Top left corner of matrix B: -1. -2. -3. -4. -5. -6. -1001. -1002. -1003. -1004. -1005. -1006. -2001. -2002. -2003. -2004. -2005. -2006. -3001. -3002. -3003. -3004. -3005. -3006. -4001. -4002. -4003. -4004. -4005. -4006. -5001. -5002. -5003. -5004. -5005. -5006. Top left corner of matrix C: -2.6666E+09 -2.6666E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -6.6467E+09 -6.6467E+09 -6.6468E+09 -6.6468E+09 -6.6469E+09 -6.6470E+09 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8588E+10 -1.8588E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2568E+10 -2.2568E+10 Example completed.

TBB: C++

The module tbb/2021.7.0 is automatically loaded when loading compiler/2022.2.0 or mkl/2022.2.0 for this reason it is omitted from the module loads used in the examples.

As per the Specifications section of the Intel oneAPI Threading Building Block documentation this module only works with C++ compilers.

/* Based on: https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-onetbb/top.html Added sleep in code to add enough of a delay to verify that it is using multiple cores. Solution came from: https://stackoverflow.com/questions/158585/how-do-you-add-a-timed-delay-to-a-c-program */ #include <iostream> #include "tbb.h" #include <unistd.h> int main() { int sum = oneapi::tbb::parallel_reduce(oneapi::tbb::blocked_range<int>(1,101), 0, [](oneapi::tbb::blocked_range<int> const& r, int init) -> int { for (int v = r.begin(); v != r.end(); v++ ) { init += v; sleep(3); } return init; }, [](int lhs, int rhs) -> int { return lhs + rhs; } ); std::cout << "Sum of all ints from 1 to 100: " << sum << std::endl; return 0; }

Using dpcpp compiler

[@m001 data]$ module load oneapi/2022.3 compiler/2022.2.0 [@m001 data]$ dpcpp -DMKL_ILP64 -I"${TBBROOT}/include/tbb" -ltbb -o tbb_test_dpcpp tbb_test.cpp

Using icc compiler

[@m001 test]$ module load oneapi/2022.3 icc/2022.2.0 tbb/2021.7.0 [@m001 test]$ icc -I"${TBBROOT}/include/tbb" -ltbb -o tbb_test_icc tbb_test.cpp

Using icpx compiler

[@m001 data]$ module load oneapi/2022.3 compiler/2022.2.0 mkl/2022.2.0 [@m001 test]$ icpx -I"${TBBROOT}/include/tbb" -ltbb -o tbb_test_icpx tbb_test.cpp

Output

Since all of the C++ examples are built with the same source files the results of the executable will be the same: The executable will also work across multiple CPUs so experiment with the number used in the salloc.

[@blog2 data]$ salloc --account=arcc --time=30:00 --cpus-per-tasks=8 [@m001 data]$ ./tbb_test Sum of all ints from 1 to 100: 5050

MPI in oneAPI Ecosystem

Intel’s oneAPI MPI compilers use their own copy of mpirun which uses a built in process manager that pulls the relevant information from SLURM to run code compiled with the oneAPI compilers.

Messages Appearing During Code Execution

Because the process manager used by oneAPI MPI compilers automatically pulls information from SLURM it will warn the user that it is ignoring certain environmental variables:

MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found [0] MPI startup(): I_MPI_EXTRA_FILESYSTEM_LIST environment variable is not supported. [0] MPI startup(): Similar variables: I_MPI_EXTRA_FILESYSTEM I_MPI_EXTRA_FILESYSTEM_FORCE

To suppress these errors in either the sbatch script or salloc session enter the following:

[@m001 mpi_test]$ unset I_MPI_EXTRA_FILESYSTEM_LIST [@m001 mpi_test]$ unset I_MPI_PMI_LIBRARY

Sample Code: The sample code used in this webpage is available on Beartooth in this location: /apps/u/opt/compilers/oneapi/2022.3/mpi/2021.7.0/test

MPI: C

/* Copyright Intel Corporation. This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission. This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License. */ #include "mpi.h" #include <stdio.h> #include <string.h> int main(int argc, char *argv[]) { int i, rank, size, namelen; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Status stat; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(name, &namelen); if (rank == 0) { printf("Hello world: rank %d of %d running on %s\n", rank, size, name); for (i = 1; i < size; i++) { MPI_Recv(&rank, 1, MPI_INT, i, 1, MPI_COMM_WORLD, &stat); MPI_Recv(&size, 1, MPI_INT, i, 1, MPI_COMM_WORLD, &stat); MPI_Recv(&namelen, 1, MPI_INT, i, 1, MPI_COMM_WORLD, &stat); MPI_Recv(name, namelen + 1, MPI_CHAR, i, 1, MPI_COMM_WORLD, &stat); printf("Hello world: rank %d of %d running on %s\n", rank, size, name); } } else { MPI_Send(&rank, 1, MPI_INT, 0, 1, MPI_COMM_WORLD); MPI_Send(&size, 1, MPI_INT, 0, 1, MPI_COMM_WORLD); MPI_Send(&namelen, 1, MPI_INT, 0, 1, MPI_COMM_WORLD); MPI_Send(name, namelen + 1, MPI_CHAR, 0, 1, MPI_COMM_WORLD); } MPI_Finalize(); return (0); }

Using mpicc

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpicc -o mpicc_test_c mpi_test.c

Using mpicxx

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpicxx -o mpicxx_test_c mpi_test.c

Using mpigcc

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpigcc -o mpigcc_test_c mpi_test.c

Using mpigxx

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpigxx -o mpigxx_test_c mpi_test.c

Using mpiicc

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 icc/2022.2.0 [@m001 mpi_test]$ mpiicc -o mpiicc_test_c mpi_test.c

Output

Since all of the C examples are built with the same source files the results of the executable will be the same:

[@blog2 mpi_test]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8 --nodes=2 [@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ unset I_MPI_EXTRA_FILESYSTEM_LIST [@m001 mpi_test]$ unset I_MPI_PMI_LIBRARY [@m001 mpi_test]$ mpirun ./mpicc_test_c Hello world: rank 0 of 16 running on m001 Hello world: rank 1 of 16 running on m001 Hello world: rank 2 of 16 running on m001 Hello world: rank 3 of 16 running on m001 Hello world: rank 4 of 16 running on m001 Hello world: rank 5 of 16 running on m001 Hello world: rank 6 of 16 running on m001 Hello world: rank 7 of 16 running on m001 Hello world: rank 8 of 16 running on m002 Hello world: rank 9 of 16 running on m002 Hello world: rank 10 of 16 running on m002 Hello world: rank 11 of 16 running on m002 Hello world: rank 12 of 16 running on m002 Hello world: rank 13 of 16 running on m002 Hello world: rank 14 of 16 running on m002 Hello world: rank 15 of 16 running on m002

MPI: C++

/* Copyright Intel Corporation. This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission. This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License. */ #include "mpi.h" #include <iostream> int main(int argc, char *argv[]) { int i, rank, size, namelen; char name[MPI_MAX_PROCESSOR_NAME]; MPI::Status stat; MPI::Init(argc, argv); size = MPI::COMM_WORLD.Get_size(); rank = MPI::COMM_WORLD.Get_rank(); MPI::Get_processor_name(name, namelen); if (rank == 0) { std::cout << "Hello world: rank " << rank << " of " << size << " running on " << name << "\n"; for (i = 1; i < size; i++) { MPI::COMM_WORLD.Recv(&rank, 1, MPI_INT, i, 1, stat); MPI::COMM_WORLD.Recv(&size, 1, MPI_INT, i, 1, stat); MPI::COMM_WORLD.Recv(&namelen, 1, MPI_INT, i, 1, stat); MPI::COMM_WORLD.Recv(name, namelen + 1, MPI_CHAR, i, 1, stat); std::cout << "Hello world: rank " << rank << " of " << size << " running on " << name << "\n"; } } else { MPI::COMM_WORLD.Send(&rank, 1, MPI_INT, 0, 1); MPI::COMM_WORLD.Send(&size, 1, MPI_INT, 0, 1); MPI::COMM_WORLD.Send(&namelen, 1, MPI_INT, 0, 1); MPI::COMM_WORLD.Send(name, namelen + 1, MPI_CHAR, 0, 1); } MPI::Finalize(); return (0); }

Using mpicxx

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpicxx -o mpicc_test_cpp mpi_test.cpp

Using mpigxx

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpigxx -o mpigxx_test_cpp mpi_test.cpp

Using mpiicpc

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 icc/2022.2.0 [@m001 mpi_test]$ mpiicpc -o mpiicpc_test_cpp mpi_test.cpp

Output

Since all of the C++ examples are built with the same source files the results of the executable will be the same:

[@blog2 mpi_test]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8 --nodes=2 [@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ unset I_MPI_EXTRA_FILESYSTEM_LIST [@m001 mpi_test]$ unset I_MPI_PMI_LIBRARY [@m001 mpi_test]$ mpirun ./mpicxx_test_cpp Hello world: rank 0 of 16 running on m001 Hello world: rank 1 of 16 running on m001 Hello world: rank 2 of 16 running on m001 Hello world: rank 3 of 16 running on m001 Hello world: rank 4 of 16 running on m001 Hello world: rank 5 of 16 running on m001 Hello world: rank 6 of 16 running on m001 Hello world: rank 7 of 16 running on m001 Hello world: rank 8 of 16 running on m002 Hello world: rank 9 of 16 running on m002 Hello world: rank 10 of 16 running on m002 Hello world: rank 11 of 16 running on m002 Hello world: rank 12 of 16 running on m002 Hello world: rank 13 of 16 running on m002 Hello world: rank 14 of 16 running on m002 Hello world: rank 15 of 16 running on m002

MPI: Fortran

! ! Copyright Intel Corporation. ! ! This software and the related documents are Intel copyrighted materials, and ! your use of them is governed by the express license under which they were ! provided to you (License). Unless the License provides otherwise, you may ! not use, modify, copy, publish, distribute, disclose or transmit this ! software or the related documents without Intel's prior written permission. ! ! This software and the related documents are provided as is, with no express ! or implied warranties, other than those that are expressly stated in the ! License. ! program main implicit none include 'mpif.h' integer i, size, rank, namelen, ierr character*(MPI_MAX_PROCESSOR_NAME) name integer stat(MPI_STATUS_SIZE) call MPI_INIT (ierr) call MPI_COMM_SIZE (MPI_COMM_WORLD, size, ierr) call MPI_COMM_RANK (MPI_COMM_WORLD, rank, ierr) call MPI_GET_PROCESSOR_NAME (name, namelen, ierr) if (rank.eq.0) then print *, 'Hello world: rank ', rank, ' of ', size, x ' running on ', name do i = 1, size - 1 call MPI_RECV (rank, 1, MPI_INTEGER, i, 1, x MPI_COMM_WORLD, stat, ierr) call MPI_RECV (size, 1, MPI_INTEGER, i, 1, x MPI_COMM_WORLD, stat, ierr) call MPI_RECV (namelen, 1, MPI_INTEGER, i, 1, x MPI_COMM_WORLD, stat, ierr) name = '' call MPI_RECV (name, namelen, MPI_CHARACTER, i, 1, x MPI_COMM_WORLD, stat, ierr) print *, 'Hello world: rank ', rank, ' of ', size, x ' running on ', name enddo else call MPI_SEND (rank, 1, MPI_INTEGER, 0, 1, x MPI_COMM_WORLD, ierr) call MPI_SEND (size, 1, MPI_INTEGER, 0, 1, x MPI_COMM_WORLD, ierr) call MPI_SEND (namelen, 1, MPI_INTEGER, 0, 1, x MPI_COMM_WORLD, ierr) call MPI_SEND (name, namelen, MPI_CHARACTER, 0, 1, x MPI_COMM_WORLD, ierr) endif call MPI_FINALIZE (ierr) end

Using mpif77

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpif77 -o mpif77_test_f mpi_test.f

Using mpif90

[@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ mpif90 -o mpif90_test_f mpi_test.f

Using mpifc compiler

[@m001 data]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 test]$ mpifc -o mpifc_test_f mpi_test.f

Using mpiifort compiler

[@m001 data]$ module load oneapi/2022.3 mpi/2021.7.0 icc/2022.2.0 [@m001 test]$ mpiifort -o mpiifor_test_f mpi_test.f

Output

Since all of the Fortran examples are built with the same source files the results of the executable will be the same:

[@blog2 mpi_test]$ salloc --account=arcc --time=30:00 --ntasks-per-node=8 --nodes=2 [@m001 ~]$ module load oneapi/2022.3 mpi/2021.7.0 [@m001 mpi_test]$ unset I_MPI_EXTRA_FILESYSTEM_LIST [@m001 mpi_test]$ unset I_MPI_PMI_LIBRARY [@m001 mpi_test]$ mpirun ./mpif77_test_f Hello world: rank 0 of 16 running on m001 Hello world: rank 1 of 16 running on m001 Hello world: rank 2 of 16 running on m001 Hello world: rank 3 of 16 running on m001 Hello world: rank 4 of 16 running on m001 Hello world: rank 5 of 16 running on m001 Hello world: rank 6 of 16 running on m001 Hello world: rank 7 of 16 running on m001 Hello world: rank 8 of 16 running on m002 Hello world: rank 9 of 16 running on m002 Hello world: rank 10 of 16 running on m002 Hello world: rank 11 of 16 running on m002 Hello world: rank 12 of 16 running on m002 Hello world: rank 13 of 16 running on m002 Hello world: rank 14 of 16 running on m002 Hello world: rank 15 of 16 running on m002

 

Related content