Contents
Table of Contents |
---|
R - Statistical Computing
Packages/Libraries
R Stan
Futures/Parallel etc
...
Table of Contents | ||||||||
---|---|---|---|---|---|---|---|---|
|
Warning |
---|
As from the ARCC -Announcement that went out on the 23rd of May 2024: Here are several announcements about the vulnerability:
We encourage all of our R users to migrate to R version 4.4.0, and off of prior versions of R (4.3.x or earlier) at your earliest convenience. To assist you with this migration we have installed modules for R version 4.4.0 on the Beartooth HPC Environment:
These modules are available via the “module …” commands as well as in OnDemand. The R/4.4.0 module is now the default R module on Beartooth, Loren and Wildiris HPC Clusters and will be the only R module available on MedicineBow. These modules include the R packages we typically included in our earlier R modules. If you have installed any libraries yourself you will need to re-install those libraries in R version 4.4.0, as those installations are version-specific. We intend to disable ARCC’s older R modules on Beartooth by Friday June 28th, 2024. If you have installed your own copy of R, via conda or some other method, you are welcome to use ARCC’s R modules. We encourage you to upgrade your personally installed version of R to 4.4.0. |
Note | ||
---|---|---|
Regards the announcement: Executive Summary: Updating compiler on Medicine. loading
This warning can be ignored. But, we recommend that you use the |
Overview
R: is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Below are links to pages that are related to R. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and macOS.
Using
Use the module name r
to discover versions available and to load the application.
Pre-Installed Libraries:
Some versions of r
have had common libraries pre-installed. To check, you can either try loading the library, or you can list all the libraries installed using:
Code Block |
---|
> packinfo <- installed.packages(fields = c("Package", "Version"))
> packinfo[, "Version", drop=F] |
Multicore
Typically, using parallel::detectCores()
to detect the number of available cores on a cluster node is a slight red herring. This returns the entire total number of cores of the node your job is allocated and not the actual number of cores you requested/allocated. For example, if you're sbatch script defines the following,
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8 |
and you're allocated a standard Teton node that have 32 cores, parallel::detectCores()
will return a value of 32 and not 8 which is what you requested!
This will probably lead to unexpected results/failures when you try and run a function expecting 32 cores when only 8 are actually available.
To remove this problem you can use, and need to pass into your R script, the value of the $SLURM_JOB_CPUS_PER_NODE
slurm environment variable.
Example
Batch Script: (fragments of what your script might look like):
Code Block |
---|
#!/bin/bash
...
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
...
echo "SLURM_JOB_CPUS_PER_NODE:" $SLURM_JOB_CPUS_PER_NODE
...
module load swset/2018.05 gcc/7.3.0 r/3.6.1
...
Rscript multiple_cpu_test.R $SLURM_JOB_CPUS_PER_NODE
... |
R Script: multiple_cpu_test.R
Code Block |
---|
args <- commandArgs(trailingOnly = TRUE)
if (!is.na(args[1])) {
num_of_cores <- args[1]
print(paste0("Num of Cores: ", num_of_cores))
}
print(paste0("detectCores: ", parallel::detectCores()))
options(mc.cores = num_of_cores)
print(paste0("mc.cores: ", getOption("mc.cores", 1L))) |
Slurm Output:
Code Block |
---|
SLURM_JOB_CPUS_PER_NODE: 8
...
[1] "Num of Cores: 8"
[1] "detectCores: 32"
[1] "mc.cores: 8" |