NVidia HPC SDK

Overview

Beartooth: Use the module name nvhpc to discover versions available and to load the application.
Teton: Use the module name hpc-sdk to discover versions available and to load the application.

The SDK provides a number of compilers. Choose depending on your language and programming model:

Compiler	Description	Supports

Compiler	Description	Supports
nvc	C11 compiler for NVIDIA GPUs and AMD, Intel, OpenPOWER, and Arm CPUs. It invokes the C compiler, assembler, and linker for the target processors with options derived from its command line arguments.	ISO C11 GPU programming with OpenACC Multicore CPU programming with OpenACC and OpenMP.
nvc++	C++17 compiler for NVIDIA GPUs and Intel CPUs. It invokes the C++ compiler, assembler, and linker for the target processors with options derived from its command line arguments.	ISO C++17 GPU programming with C++17 parallel algorithms (pSTL) and OpenACC Multicore CPU programming with OpenACC and OpenMP.
nvfortran	Fortran compiler for NVIDIA GPUs and Intel CPUs. It invokes the Fortran compiler, assembler, and linker for the target processors with options derived from its command line arguments.	ISO Fortran 2003 and many features of ISO Fortran 2008 GPU programming with CUDA Fortran and OpenACC Multicore CPU programming with OpenACC and OpenMP.
nvcc	CUDA C and CUDA C++ compiler driver for NVIDIA GPUs. nvcc accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process. nvcc produces optimized code for NVIDIA GPUs and drives a supported host compiler for AMD, Intel, OpenPOWER, and Arm CPUs.

The installed SDKs do come with a folder of examples which you can take a copy of. For example, on Beartooth:

[]$ cp -R /apps/u/opt/compilers/nvhpc/22.11/Linux_x86_64/22.11/examples .

Notes:

MPI: The SDK does provide it’s own implement of OpenMPI (i.e. you do not have to separately load an openmpi module). Once built, run your code with srun to pick up your requested node/task/cpu configuration.
When make-ing the various examples, they typically build and run the code. So, in some cases, unless you have created an interactive session with multiple cores (for MPI) or requested a GPU, although the code will build and it will error when run.