Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Overview

  • Rmpi: An interface (wrapper) to MPI. It also provides interactive R manager and worker environment.

Using

This usage relates to using this library on Beartooth.

We currently only have version 0.7.1 available.

Note: Loading this module will also load in r/4.2.2

Meaning you have an environment with r/4.2.2 and the Rmpi library.

You do NOT need to load r/4.2.2 as a module separately.

Multicore

This R library is designed to run across multiple nodes, and multiple tasks on a node.

ONLY using the Rmpi library

If you are only using the Rmpi library, and no other related parallel libraries such as snow, then to allow the use of Rmpi on the cluster you first need to copy the following file into your home folder. (If you already have a .Rprofile file in your home, you'll need to update it.)

[]$ cp /apps/u/spack/gcc/12.2.0/r-rmpi/0.7-1-3eiutsq/rlib/R/library/Rmpi/Rprofile ~/.Rprofile

To then use the module, you’ll need to load the following modules:

module load gcc/12.2.0 openmpi/4.1.4 r-rmpi/0.7-1-ompi

Example

This example was based on an example here:

 rmpi_test.R
# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
    library("Rmpi")
    }

ns <- mpi.universe.size() - 1

mpi.spawn.Rslaves(nslaves=ns)

# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function(){
       if (is.loaded("mpi_initialize")){
           if (mpi.comm.size(1) > 0){
               print("Please use mpi.close.Rslaves() to close slaves.")
               mpi.close.Rslaves()
           }
           print("Please use mpi.quit() to quit R")
           .Call("mpi_finalize")
       }
}

# Tell all slaves to return a message identifying themselves
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( ns <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))

# Test computations
x <- 5
x <- mpi.remote.exec(rnorm, x)
length(x)
x

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves(dellog = FALSE)
mpi.quit()
 run.sh
#!/bin/bash
#SBATCH --job-name=rmpi-test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --time=10:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your-email-addr>
#SBATCH --account=<your-project>

module load gcc/12.2.0 openmpi/4.1.4 r-rmpi/0.7-1-ompi

srun Rscript rmpi_test.R
 Example Run and Output
[]$ sbatch run_01.sh
Submitted batch job 13116893

[]$ cat slurm-13116893.out
master (rank 0, comm 1) of size 8 is running on: ttest01
slave1 (rank 1, comm 1) of size 8 is running on: ttest01
slave2 (rank 2, comm 1) of size 8 is running on: ttest01
slave3 (rank 3, comm 1) of size 8 is running on: ttest01
slave4 (rank 4, comm 1) of size 8 is running on: ttest02
slave5 (rank 5, comm 1) of size 8 is running on: ttest02
slave6 (rank 6, comm 1) of size 8 is running on: ttest02
slave7 (rank 7, comm 1) of size 8 is running on: ttest02
Error in mpi.spawn.Rslaves(nslaves = ns) :
  It seems there are some slaves running on comm  1
$slave1
[1] "I am 1 of 8"

$slave2
[1] "I am 2 of 8"

$slave3
[1] "I am 3 of 8"

$slave4
[1] "I am 4 of 8"

$slave5
[1] "I am 5 of 8"

$slave6
[1] "I am 6 of 8"

$slave7
[1] "I am 7 of 8"

[1] 7
           X1          X2         X3         X4         X5         X6
1  0.25231568 -0.70670787 -0.8623333 -1.1538241 -1.3747273 -0.9696954
2 -0.91498764  1.09819580  0.5737269 -0.6856323  0.8941616 -1.7339326
3  1.51865169  1.63120359 -0.9954300  0.2413086  0.2627482  1.6690493
4  1.09594877  0.08905511 -0.1490578  1.2190246 -0.1724257 -1.3822756
5  0.09966169 -1.92527468  0.9805431  1.8346315  0.2773092 -0.7084154
           X7
1 -0.26325262
2 -1.20082024
3 -0.04534522
4 -0.14685414
5  0.34071411
[1] 1
 Error if no .Rprofile
# If you do not copy the .Rprofile file into you home, you'll see an error of the form:
Error in mpi.comm.spawn(slave = system.file("Rslaves.sh", package = "Rmpi"),  :
  MPI_ERR_SPAWN: could not spawn processes
Calls: mpi.spawn.Rslaves -> mpi.comm.spawn
Execution halted

Using the Rmpi library with Snow

To start:

  • This assumes you have installed the snow library yourself into your home.

  • You know the location for this library: For example: ~/R/x86_64-pc-linux-gnu-library/4.2/snow/

  • You have NOT copied the .Rprofile file into your home folder. If you have, then running this example will appear that the test has stalled and will eventually just time out.

 snow_test.R
library(Rmpi)
library(snow)

simu <- function(rep_worker, n_used) {
  theta_simu <- c()
  for (i in 1 : rep_worker) {
    theta_simu[i] <- mean(rnorm(n_used))
  }
  theta_simu
}

num_of_processes <- mpi.universe.size()
sprintf("Num of Processes: %d", num_of_processes)

cluster <- makeCluster(num_of_processes - 1, type = "MPI")

n_used <- 1e4
rep_worker_list <- rep(1, 100)

theta <- clusterApply(cluster, rep_worker_list, simu, n_used)

theta_cbind <- do.call(cbind, theta)
write.csv(theta_cbind, file="values.csv")

stopCluster(cluster)
 run.sh
#!/bin/bash
#SBATCH --job-name=rmpi_snow_test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=1
#SBATCH --time=2:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your-email-addr>
#SBATCH --account=<your-project>

echo "SLURM_JOB_ID:" $SLURM_JOB_ID

# Modules to Load
module load gcc/12.2.0 openmpi/4.1.4 r-rmpi/0.7-1-ompi

# https://stat.ethz.ch/pipermail/r-sig-hpc/2019-November/002105.html
# This sets the path to where to find the RMPISNO command.
export PATH=$PATH:~/R/x86_64-pc-linux-gnu-library/4.2/snow/

mpirun RMPISNOW CMD BATCH --no-restore --no-save --quiet snow_test.R snow_test_$SLURM_JOB_ID.log
 Example Run and Output
[] ls
run.sh  snow_test.R

[]$ sbatch run.sh
Submitted batch job 13433495
# job should take less that 10 seconds to complete.

[] ls
run.sh  slurm-13433495.out  snow_test_13433495.log  values.csv  snow_test.R

[] cat slurm-13433495.out
SLURM_JOB_ID: 13433495

[]$ cat snow_test_13433495.log
Loading required package: utils
> library(Rmpi)
> library(snow)
>
> simu <- function(rep_worker, n_used) {
+   theta_simu <- c()
+   for (i in 1 : rep_worker) {
+     theta_simu[i] <- mean(rnorm(n_used))
+   }
+   theta_simu
+ }
>
> num_of_processes <- mpi.universe.size()
> sprintf("Num of Processes: %d", num_of_processes)
[1] "Num of Processes: 32"
>
> cluster <- makeCluster(num_of_processes - 1, type = "MPI")
>
> n_used <- 1e4
> rep_worker_list <- rep(1, 100)
>
> theta <- clusterApply(cluster, rep_worker_list, simu, n_used)
>
> theta_cbind <- do.call(cbind, theta)
> write.csv(theta_cbind, file="values.csv")
>
> stopCluster(cluster)

# Output will be of the form:
[]$ cat values.csv
"","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","V29","V30","V31","V32","V33","V34","V35","V36","V37","V38","V39","V40","V41","V42","V43","V44","V45","V46","V47","V48","V49","V50","V51","V52","V53","V54","V55","V56","V57","V58","V59","V60","V61","V62","V63","V64","V65","V66","V67","V68","V69","V70","V71","V72","V73","V74","V75","V76","V77","V78","V79","V80","V81","V82","V83","V84","V85","V86","V87","V88","V89","V90","V91","V92","V93","V94","V95","V96","V97","V98","V99","V100"
"1",0.0110153488801525,0.00906305581302429,0.00108685858240707,0.00867668055186904,0.00578274342442965,-0.00530732478112944,0.0120954477596871,-0.00359434785869044,0.00835107071072111,2.64364921843532e-05,0.0152691100103968,-0.0135202565458591,0.00241255871463997,0.00137419862397849,0.00115252792794432,-0.0175922825490621,0.00110849307208272,-0.00937694151181359,-0.0131149112354201,0.00596388487426565,0.0219874640222224,-0.00547747285343229,0.00416837900267555,0.0139932426057719,-0.0234641772417162,0.00433451519201003,-0.00525816814860096,0.00414031282343361,0.00130366800166443,0.000413263824619443,0.0104087338213028,0.00149700313038921,0.011502836202072,-0.00715751527844509,0.0119589163613294,0.0220381656609346,-0.016000771903997,-0.00183947331801148,-0.00284276679070134,-0.00694346146022534,-0.0258218986912262,0.00473135994639517,0.00831409717862001,0.0182355174080247,-0.00931334883761317,0.00529566801009098,-0.00302557027855197,0.00346677904211363,0.00571337545701443,-0.00586232060572412,-0.00256997376593396,0.0165929336422106,0.00796493065507422,-0.00438474136670677,0.0062288833102191,-0.0175721248899911,0.00165933692067554,-0.00237930737409138,0.00121451126970138,0.00623046211970692,0.00559793460867063,-0.00412640828783677,-0.00764407338711362,-0.000460569630436792,-0.0107502392297747,-0.00421225031438457,-0.000926513045440252,0.00334739878419211,-0.00452111805160875,-0.0046740544706875,0.0155997050952078,-0.0234192042710321,0.00324579902597707,-0.0151148830758793,-0.000523464705140069,-0.00175640010460385,-0.010243166679217,-0.00668035373700306,-0.0119873621894053,-0.0141762507674786,-0.00783107010368886,0.0115902891884065,-0.00762658494377125,-0.0223384107212392,-0.00379425267947311,0.0138895890210734,0.00392029947365504,0.00248380077423007,-0.0064247327395136,0.00434147149528924,-0.00572841369840578,0.00966805999144852,-0.0122907653345613,0.00596172188548434,-0.0122757100311107,-0.000426327204500513,0.00108879897276763,0.00975469886227781,0.00675195747959386,-0.00288208828533988

This example is based on a script and discussion here.

  • No labels