Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • R: is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Below are links to pages that are related to R. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and macOS.

Using

Use the module name r to discover versions available and to load the application.

Once the modules have been loaded:

Code Block
[]$ R
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Storage

Matrix products: default
BLAS:   /pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/r/3.6.1-3rtwrmw/rlib/R/lib/libRblas.so
LAPACK: /pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/r/3.6.1-3rtwrmw/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.1


> quit()
Save workspace image? [y/n/c]: n
[@tlog2 ~]$ 

Note:

  • This software is dependent on the following modules:

    • gcc/7.3.0

    • Due to the install process, at this moment in time, you have to explicitly load gcc before loading r. If you try loading r before gcc you will see the following message:

...

Multicore

Typically, using parallel::detectCores() to detect the number of available cores on a cluster node is a slight red herring. This returns the entire total number of cores of the node your job is allocated and not the actual number of cores you requested/allocated. For example, if you're sbatch script defines the following,

Code Block
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8

and you're allocated a standard Teton node that have 32 cores, parallel::detectCores() will return a value of 32 and not 8 which is what you requested!
This will probably lead to unexpected results/failures when you try and run a function expecting 32 cores when only 8 are actually available.
To remove this problem you can use, and need to pass into your R script, the value of the $SLURM_JOB_CPUS_PER_NODE slurm environment variable.

Example

Batch Script: (fragments of what your script might look like):

Code Block
#!/bin/bash
...
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
...
echo "SLURM_JOB_CPUS_PER_NODE:" $SLURM_JOB_CPUS_PER_NODE
...
module load swset/2018.05 gcc/7.3.0 r/3.6.1
...
Rscript multiple_cpu_test.R $SLURM_JOB_CPUS_PER_NODE
...

R Script: multiple_cpu_test.R

Code Block
args <- commandArgs(trailingOnly = TRUE)
if (!is.na(args[1])) {
  num_of_cores <- args[1]
  print(paste0("Num of Cores: ", num_of_cores))
}

print(paste0("detectCores: ", parallel::detectCores()))

options(mc.cores = num_of_cores)
print(paste0("mc.cores: ", getOption("mc.cores", 1L)))

Slurm Output:

Code Block
SLURM_JOB_CPUS_PER_NODE: 8
...
[1] "Num of Cores: 8"
[1] "detectCores: 32"
[1] "mc.cores: 8"

R Packages

Below we will give some guidelines on how to install and use various R packages specifically on Teton.

...

  • Packages installed/built with one major.minor version will typically not work under another.

R Package: RStan

Installing Packages: Potential Problems

Trying to install install.packages("labdsv") resulted in the following error:

Code Block
/apps/u/gcc/4.8.5/intel/18.0.1-7cbw2rp/include/complex(77): error #308: member "std::complex<float>::_M_value" (declared at line 1187 of "/usr/include/c++/4.8.5/complex") is inaccessible
          _M_value = __z._M_value;
...
compilation aborted for sptree.cpp (code 2)
make: *** [sptree.o] Error 2
ERROR: compilation failed for package ‘Rtsne’
* removing ‘/pfs/tsfs1/home/salexan5/R/intel/3.6/Rtsne’
ERROR: dependency ‘Rtsne’ is not available for package ‘labdsv’
* removing ‘/pfs/tsfs1/home/salexan5/R/intel/3.6/labdsv’

This appears to be a reasonably common problem:

and is essentially a result of conflicts between compilers when using complex data types with the workaround of disabling the diagnostic error.
To resolve the issue, create and/or update the ~/.R/Makevars file by adding the following lines:

...

R and Intel/MKL

We On Teton we have versions of r (3.6.1/4.0.2) built with the Intel compiler and related MKL (Maths Kernel Library) that follows a request relating to Improving R Performance by installing optimized BLAS/LAPACK libraries.
To use:

...

Installing Packages to Use with Intel Version

...

  • The packages that you have installed for the standard versions of R will not work for the Intel version since they are built with different compilers. This means you will need to re-install the packages that you use.

  • If you potentially want to use both versions then you will need to create a second folder to install the Intel versions into.

  • This has been tested with R.3.6.1 intel version - a similar approach should apply for 4.0

On Teton, R packages are typically installed into:

...

  • Install packages as normal e.g. install.packages("<the package's name>")

  • When running your R scripts you need to set .libPaths(c("~/R/intel/3.6/")) before loading any libraries to inform R where the appropriate packages can be found.

  • Note: Currently R Package: RStan can not be installed using the intel version.

Installing Packages: Potential Problems

Trying to install install.packages("labdsv") resulted in the following error:

Code Block
/apps/u/gcc/4.8.5/intel/18.0.1-7cbw2rp/include/complex(77): error #308: member "std::complex<float>::_M_value" (declared at line 1187 of "/usr/include/c++/4.8.5/complex") is inaccessible
          _M_value = __z._M_value;
...
compilation aborted for sptree.cpp (code 2)
make: *** [sptree.o] Error 2
ERROR: compilation failed for package ‘Rtsne’
* removing ‘/pfs/tsfs1/home/salexan5/R/intel/3.6/Rtsne’
ERROR: dependency ‘Rtsne’ is not available for package ‘labdsv’
* removing ‘/pfs/tsfs1/home/salexan5/R/intel/3.6/labdsv’

This appears to be a reasonably common problem:

and is essentially a result of conflicts between compilers when using complex data types with the workaround of disabling the diagnostic error.
To resolve the issue, create and/or update the ~/.R/Makevars file by adding the following lines:

Teton: Using Multiple CPUs

Typically, using parallel::detectCores() to detect the number of available cores on a cluster node is a slight red herring. This returns the entire total number of cores of the node your job is allocated and not the actual number of cores you requested/allocated. For example, if you're sbatch script defines the following,

Code Block
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8

and you're allocated a standard Teton node that have 32 cores, parallel::detectCores() will return a value of 32 and not 8 which is what you requested!
This will probably lead to unexpected results/failures when you try and run a function expecting 32 cores when only 8 are actually available.
To remove this problem you can use, and need to pass into your R script, the value of the $SLURM_JOB_CPUS_PER_NODE slurm environment variable.

Below is an example of how to do this:

...

Code Block
#!/bin/bash
...
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
...
echo "SLURM_JOB_CPUS_PER_NODE:" $SLURM_JOB_CPUS_PER_NODE
...
module load swset/2018.05 gcc/7.3.0 r/3.6.1
...
Rscript multiple_cpu_test.R $SLURM_JOB_CPUS_PER_NODE
...

R Script: multiple_cpu_test.R

Code Block
args <- commandArgs(trailingOnly = TRUE)
if (!is.na(args[1])) {
  num_of_cores <- args[1]
  print(paste0("Num of Cores: ", num_of_cores))
}

print(paste0("detectCores: ", parallel::detectCores()))

options(mc.cores = num_of_cores)
print(paste0("mc.cores: ", getOption("mc.cores", 1L)))

Slurm Output:

...

  • .

...