Goal: Introduction some high-level aspects of using R in parallel relating to the cluster.

Note

In the same spirit as this is not a course on learning the R language, this is not a section on developing parallelized code with any of the 10s of parallel related packages.

Instead it will detail some aspects to consider regards using our cluster.

...

Table of Contents

style	none

...

Parallel Programming with R

The are 10s of potential packages that could be used, as a starting point we’d direct your to here: CRAN Task View: High-Performance and Parallel Computing with R.

One thing to consider with respect to what package you wish to explore is whether it provides multi-node functionality (such as Rmpi) or just multicore (parallel) on a single compute node, and/or cluster features.

Note
Remember: Just asking for multiple nodes (and GPUs) won’t actually make your code run faster unless the underlying package can actually utilize them.

...

R `parallel` Package: Overview

Info

The parallel package is not part of the core R installation and is now a base package.

It does not need to be installed.

...

Building Rmpi from Source

Info
If you wish to try to install `Rmpi`, you should use the latest implementation of `OpenMPI` on the cluster to build against.

...

Multicore: Detecting Cores

Note

Typically, using parallel::detectCores() to detect the number of available cores on a cluster node is a slight red herring. This returns the entire total number of cores of the node your job is allocated and not the actual number of cores you requested/allocated.

For example, if you're sbatch script defines the following,

Code Block
#SBATCH --nodes=1 #SBATCH --cpus-per-task=8

and you're allocated a standard compute node that has 32 cores, parallel::detectCores() will return a value of 32 and not 8 which is what you requested!
This will probably lead to unexpected results/failures when you try and run a function expecting 32 cores when only 8 are actually available.
To remove this problem you can use, and need to pass into your R script, the value of the $SLURM_JOB_CPUS_PER_NODE slurm environment variable.

...

Detect Cores Example

Expand

title	r_multicore.R

Code Block

args <- commandArgs(trailingOnly = TRUE)
if (!is.na(args[1])) {
  num_of_cores <- args[1]
  print(paste0("Num of Cores: ", num_of_cores))
}

print(paste0("detectCores: ", parallel::detectCores()))

options(mc.cores = num_of_cores)
print(paste0("mc.cores: ", getOption("mc.cores", 1L)))

Code Block

# Create an interactive session that uses 8 cores:
[salexan5@mblog2 ~]$ salloc -A arcc -t 10:00 -c 8
salloc: Granted job allocation 861904
salloc: Nodes mbcpu-001 are ready for job
[salexan5@mbcpu-001 ~]$ module load gcc/13.2.0 r/4.4.0

# Check the slurm environment variable: SLURM_JOB_CPUS_PER_NODE
[salexan5@mbcpu-001 ~]$ echo $SLURM_JOB_CPUS_PER_NODE
8

# What does R detect?
[salexan5@mbcpu-001 ~]$ Rscript r_multicore.R $SLURM_JOB_CPUS_PER_NODE
[1] "Num of Cores: 8"
[1] "detectCores: 96"
[1] "mc.cores: 8"

...

Create an R Kernel for a Jupyter Notebook

Using R and RStudio on the Cluster

...

Versions Compared

Old Version 3

New Version 4

Key

Parallel Programming with R

R `parallel` Package: Overview

Building Rmpi from Source

Multicore: Detecting Cores

Detect Cores Example

Page Comparison

Versions Compared

Old Version 3

New Version 4

Key

Parallel Programming with R

R parallel Package: Overview

Building Rmpi from Source

Multicore: Detecting Cores

Detect Cores Example

R `parallel` Package: Overview