oneAPI: Compiling

Overview

The Intel oneAPI ecosystm provides a number of compilers and libraries. This page contains contains information about what compilers are provided on Beartooth including some basic examples and command-line instructions on how to compile.

Compilers Available

Below is a table summarizing what compilers are provided by what modules:

Compiler	Description	Language	Module

Compiler	Description	Language	Module
`icc`	Intel® C++ Compiler Classic (`icc`) is deprecated and will be removed in a oneAPI release after the second half of 2023. Intel recommends that customers transition now to using the LLVM-based Intel® oneAPI DPC++(`dpcpp`)/C++ Compiler(`icpx`)/C Compiler(`icx`) for continued Windows* and Linux* support, new language support, new language features, and optimizations.	C/C++	`icc`
`dpcpp`	The Intel® oneAPI DPC++(`dpcpp`)/C++ Compiler(`icpx`)/C Compiler(`icx`) provides optimizations that help your applications run faster on Intel® 64 architectures on Windows* and Linux, with support for the latest C, C++, and SYCL language standards. This compiler produces optimized code that can run significantly faster by taking advantage of the ever-increasing core count and vector register width in Intel® Xeon® processors and compatible processors. The Intel® Compiler will help you boost application performance through superior optimizations and Single Instruction Multiple Data (SIMD) vectorization, integration with Intel® Performance Libraries, and by leveraging the OpenMP 5.0/5.1 parallel programming model. The `dpcpp`, `icx`, and `icpx` are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case. When the `dpcpp` is used with C source code the `-fsycl` flag will automatically be applied and the C source code will automatically be converted to C++ using `SYCL`. The conversion process is not perfect and can lead to errors. The primary use case is for Data Parallel applications.	C/C++	`compiler`
`icx`	The `dpcpp`, `icx`, and `icpx` are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case. The primary use case is for standard C applications.	C	`compiler`
`icpx`	The `dpcpp`, `icx`, and `icpx` are all built from the same compiler but have different drivers behind them that allow for better optimization for their respective primary use case. The primary use case is for standard C++ applications.	C++	`compiler`
`ifx`	The Intel® Fortran Compiler (`ifx`) enables developers needing OpenMP* offload to Intel GPUs. The OpenMP 5.0, 5.1 GPU offload features in ifx are not available in ifort.	Fortran	`compiler`
`ifort`	Intel® Fortran Compiler Classic (`ifort`) provides best-in-class Fortran language features and performance for CPU. For calendar year 2022 ifort continues to be our best-in-class Fortran compiler for customers not needing GPU offload support	Fortran	`compiler`
`mpicc`	MPI C compiler that uses generic wrappers for the `gcc` complier.	C	`mpi`
`mpicxx`	MPI C++ compiler that uses generic wrappers for the `g++` compiler.	C/C++	`mpi`
`mpifc`	MPI Fortran compiler that uses generic wrappers for the `gfortran` compiler.	Fortran	`mpi`
`mpif90`	MPI Fortran compiler that uses GNU wrappers for the `gfortran` compiler.	Fortran	`mpi`
`mpif77`	MPI Fortran compiler that uses GNU wrappers for the `gfortran` compiler.	Fortran	`mpi`
`mpigcc`	MPI C compiler that uses GNU wrappers for the `gcc` complier.	C	`mpi`
`mpigxx`	MPI C++ compiler that uses GNU wrappers for the `g++` compiler.	C/C++	`mpi`
`mpiicc`	MPI C compiler that uses Intel wrappers for the `icc` compiler. The `icc` compiler is deprecated and will be removed in a oneAPI releases after the second half of 2023.	C	`mpi`
`mpiicpc`	MPI C++ compiler that uses Intel wrappers for the `icpc` compiler. The `icpc` compiler is deprecated and will be removed in a oneAPI releases after the second half of 2023.	C++	`mpi`
`mpiifort`	MPI Fortran compiler that uses Intel wrappers for the `ifort` compiler.	Fortran	`mpi`

Acronyms:

MKL: Math Kernel Library: a computing math library of highly optimized and extensively parallelized routines for applications that require maximum performance.
TBB: Threading Building Blocks: a widely used C++ library for task-based, shared memory parallel programming on the host.
MPI: Message Passing Interface: a multifabric message-passing library that implements the open source MPICH specification.

Best Practice: Because compiling code can be computationally intensive or may have long compilation times, we ask that large compilations be done either inside of a salloc session or sbatch job.

Examples:

MKL: C:

/* 
Example from: https://www.intel.com/content/www/us/en/develop/documentation/mkl-tutorial-c/top/multiplying-matrices-using-dgemm.html#multiplying-matrices-using-dgemm
*/

#define min(x,y) (((x) < (y)) ? (x) : (y))

#include <stdio.h>
#include <stdlib.h>
#include "mkl.h"

int main()
{
    double *A, *B, *C;
    int m, n, k, i, j;
    double alpha, beta;

    printf ("\n This example computes real matrix C=alpha*A*B+beta*C using \n"
            " Intel(R) MKL function dgemm, where A, B, and  C are matrices and \n"
            " alpha and beta are double precision scalars\n\n");

    m = 2000, k = 200, n = 1000;
    printf (" Initializing data for matrix multiplication C=A*B for matrix \n"
            " A(%ix%i) and matrix B(%ix%i)\n\n", m, k, k, n);
    alpha = 1.0; beta = 0.0;

    printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n"
            " performance \n\n");
    A = (double *)mkl_malloc( m*k*sizeof( double ), 64 );
    B = (double *)mkl_malloc( k*n*sizeof( double ), 64 );
    C = (double *)mkl_malloc( m*n*sizeof( double ), 64 );
    if (A == NULL || B == NULL || C == NULL) {
      printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n");
      mkl_free(A);
      mkl_free(B);
      mkl_free(C);
      return 1;
    }

    printf (" Intializing matrix data \n\n");
    for (i = 0; i < (m*k); i++) {
        A[i] = (double)(i+1);
    }

    for (i = 0; i < (k*n); i++) {
        B[i] = (double)(-i-1);
    }

    for (i = 0; i < (m*n); i++) {
        C[i] = 0.0;
    }

    printf (" Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface \n\n");
    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                m, n, k, alpha, A, k, B, n, beta, C, n);
    printf ("\n Computations completed.\n\n");

    printf (" Top left corner of matrix A: \n");
    for (i=0; i<min(m,6); i++) {
      for (j=0; j<min(k,6); j++) {
        printf ("%12.0f", A[j+i*k]);
      }
      printf ("\n");
    }

    printf ("\n Top left corner of matrix B: \n");
    for (i=0; i<min(k,6); i++) {
      for (j=0; j<min(n,6); j++) {
        printf ("%12.0f", B[j+i*n]);
      }
      printf ("\n");
    }

    printf ("\n Top left corner of matrix C: \n");
    for (i=0; i<min(m,6); i++) {
      for (j=0; j<min(n,6); j++) {
        printf ("%12.5G", C[j+i*n]);
      }
      printf ("\n");
    }

    printf ("\n Deallocating memory \n\n");
    mkl_free(A);
    mkl_free(B);
    mkl_free(C);

    printf (" Example completed. \n\n");
    return 0;
}

Using icc compiler

[@m001 test]$ module load oneapi/2022.3 icc/2022.2.0 mkl/2022.2.0
[@m001 test]$ icc -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -o mkl_test_icc mkl_test.c

Using icx compiler

[@m001 data]$ module load oneapi/2022.3 compiler/2022.2.0 mkl/2022.2.0
[@m001 test]$ icx -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -o mkl_test_icx mkl_test.c

Output

Since all of the MKL examples are built with the same source files the results of the executable will be the same:

MKL: C++

Using dpcpp compiler

Using icc compiler

Using icpx compiler

Output

Since all of the C++ MKL examples are built with the same source files the results of the executable will be the same:

MKL: Fortran

Using ifx compiler

Using ifort compiler

Output

Since all of the Fortran MKL examples are built with the same source files the results of the executable will be the same:

TBB: C++

The module tbb/2021.7.0 is automatically loaded when loading compiler/2022.2.0 or mkl/2022.2.0 for this reason it is omitted from the module loads used in the examples.

As per the Specifications section of the Intel oneAPI Threading Building Block documentation this module only works with C++ compilers.

Using dpcpp compiler

Using icc compiler

Using icpx compiler

Output

Since all of the C++ examples are built with the same source files the results of the executable will be the same: The executable will also work across multiple CPUs so experiment with the number used in the salloc.

MPI in oneAPI Ecosystem

Intel’s oneAPI MPI compilers use their own copy of mpirun which uses a built in process manager that pulls the relevant information from SLURM to run code compiled with the oneAPI compilers.

Messages Appearing During Code Execution

Because the process manager used by oneAPI MPI compilers automatically pulls information from SLURM it will warn the user that it is ignoring certain environmental variables:

To suppress these errors in either the sbatch script or salloc session enter the following:

Sample Code: The sample code used in this webpage is available on Beartooth in this location: /apps/u/opt/compilers/oneapi/2022.3/mpi/2021.7.0/test

MPI: C

Using mpicc

Using mpicxx

Using mpigcc

Using mpigxx

Using mpiicc

Output

Since all of the C examples are built with the same source files the results of the executable will be the same:

MPI: C++

Using mpicxx

Using mpigxx

Using mpiicpc

Output

Since all of the C++ examples are built with the same source files the results of the executable will be the same:

MPI: Fortran

Using mpif77

Using mpif90

Using mpifc compiler

Using mpiifort compiler

Output

Since all of the Fortran examples are built with the same source files the results of the executable will be the same: