Parallel R: Introduction

Goal: Introduction some high-level aspects of using R in parallel relating to the cluster.

In the same spirit as this is not a course on learning the R language, this is not a section on developing parallelized code with any of the 10s of parallel related packages.

Instead it will detail some aspects to consider regards using our cluster.



Parallel Programming with R

The are 10s of potential packages that could be used, as a starting point we’d direct your to here: CRAN Task View: High-Performance and Parallel Computing with R.

One thing to consider with respect to what package you wish to explore is whether it provides multi-node functionality (such as Rmpi) or just multicore (parallel) on a single compute node, and/or cluster features.

Remember: Just asking for multiple nodes (and GPUs) won’t actually make your code run faster unless the underlying package can actually utilize them.


R parallel Package: Overview


Building Rmpi from Source


Multicore: Detecting Cores


Detect Cores Example

args <- commandArgs(trailingOnly = TRUE) if (!is.na(args[1])) { num_of_cores <- args[1] print(paste0("Num of Cores: ", num_of_cores)) } print(paste0("detectCores: ", parallel::detectCores())) options(mc.cores = num_of_cores) print(paste0("mc.cores: ", getOption("mc.cores", 1L)))
# Create an interactive session that uses 8 cores: []$ salloc -A arcc -t 10:00 -c 8 salloc: Granted job allocation 861904 salloc: Nodes mbcpu-001 are ready for job [@mbcpu-001 ~]$ module load gcc/13.2.0 r/4.4.0 # Check the slurm environment variable: SLURM_JOB_CPUS_PER_NODE [@mbcpu-001 ~]$ echo $SLURM_JOB_CPUS_PER_NODE 8 # What does R detect? [@mbcpu-001 ~]$ Rscript r_multicore.R $SLURM_JOB_CPUS_PER_NODE [1] "Num of Cores: 8" [1] "detectCores: 96" [1] "mc.cores: 8"