R Environments and Reproducibility

Goal: Introduce ideas and practices to assist in managing the reproducibility of R environments.


What Packages Do I Have Installed?

First step is knowing what your environment is using, and where these packages are installed:

Remember to use .libPaths()

[]$ module load gcc/13.2.0 r/4.4.0 []$ R R version 4.4.0 (2024-04-24) -- "Puppy Cup" ... > .libPaths() [1] "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" [2] "/apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library" > quit() []$ ls /apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library base compiler datasets graphics grDevices grid methods parallel splines stats stats4 tcltk tools translations utils []$ ls /cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4 class cli DBI e1071 generics KernSmooth magrittr pillar proxy R6 rlang sf stringr tidyr units vctrs wk classInt cpp11 dplyr fansi glue lifecycle MASS pkgconfig purrr Rcpp s2 stringi tibble tidyselect utf8 withr XML

Anything ARCC has installed will not be updated. We will create a new version of the base R.


Track the R Packages and Versions you have Installed

How can I track the versions of R packages installed? Using plain R:

[salexan5@mblog1 ~]$ R R version 4.4.0 (2024-04-24) -- "Puppy Cup" ... > write.table(installed.packages()[,c(1,2,3:4)]) "Package" "LibPath" "Version" "Priority" "class" "class" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "7.3-22" "recommended" ... "sf" "sf" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.0-16" NA "stringi" "stringi" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.8.4" NA "stringr" "stringr" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.5.1" NA "tibble" "tibble" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "3.2.1" NA ... "tools" "tools" "/apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library" "4.4.0" "base" "utils" "utils" "/apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library" "4.4.0" "base"

Conda Export and R Packages

[]$ module purge []$ module load miniconda3/24.3.0 conda activate /cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env (/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env) [salexan5@mblog2 ~]$ (/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env) [salexan5@mblog2 ~]$ conda list # packages in environment at /cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env: # # Name Version Build Channel ... r-base 4.3.3 he2d9a6e_3 conda-forge r-stringi 1.8.4 r43hbd1cc82_0 conda-forge ...

Track the Building of Your Environments


Install R Packages with a Specific Version

There are a number of R packages to assist you:

  • remotes: R Package Installation from Remote Repositories, Including 'GitHub'

  • versions: Query and Install Specific Versions of Packages on CRAN

  • devtools: Tools to Make Developing R Packages Easier


Suggested Best Practices

  • For specific projects/research focuses, create specific libraries and or conda environments (with everything installed within that conda environment) to localize used packages/versions.

  • Regularly track/update what packages you’re using (install.packages / conda install r-<package-name>) and their versions.

  • Be mindful of dependencies that a package additional installs.

  • Be mindful when prompted whether you want to update dependencies or not.

  • Avoid trying to have a behemoth of a single environment - consider have a number of independent environments/libraries that can be more easily managed and shared along a workflow/pipeline.