R Environments and Reproducibility
Goal: Introduce ideas and practices to assist in managing the reproducibility of R environments.
What Packages Do I Have Installed?
First step is knowing what your environment is using, and where these packages are installed:
Remember to use .libPaths()
[]$ module load gcc/13.2.0 r/4.4.0
[]$ R
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
...
> .libPaths()
[1] "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4"
[2] "/apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library"
> quit()
[]$ ls /apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library
base compiler datasets graphics grDevices grid methods parallel splines stats stats4 tcltk tools translations utils
[]$ ls /cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4
class cli DBI e1071 generics KernSmooth magrittr pillar proxy R6 rlang sf stringr tidyr units vctrs wk
classInt cpp11 dplyr fansi glue lifecycle MASS pkgconfig purrr Rcpp s2 stringi tibble tidyselect utf8 withr XML
Anything ARCC has installed will not be updated. We will create a new version of the base R.
Track the R Packages and Versions you have Installed
How can I track the versions of R packages installed? Using plain R:
[salexan5@mblog1 ~]$ R
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
...
> write.table(installed.packages()[,c(1,2,3:4)])
"Package" "LibPath" "Version" "Priority"
"class" "class" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "7.3-22" "recommended"
...
"sf" "sf" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.0-16" NA
"stringi" "stringi" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.8.4" NA
"stringr" "stringr" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.5.1" NA
"tibble" "tibble" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "3.2.1" NA
...
"tools" "tools" "/apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library" "4.4.0" "base"
"utils" "utils" "/apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library" "4.4.0" "base"
Conda Export and R Packages
[]$ module purge
[]$ module load miniconda3/24.3.0
conda activate /cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env
(/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env) [salexan5@mblog2 ~]$
(/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env) [salexan5@mblog2 ~]$ conda list
# packages in environment at /cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env:
#
# Name Version Build Channel
...
r-base 4.3.3 he2d9a6e_3 conda-forge
r-stringi 1.8.4 r43hbd1cc82_0 conda-forge
...
Track the Building of Your Environments
Install R Packages with a Specific Version
There are a number of R packages to assist you:
remotes: R Package Installation from Remote Repositories, Including 'GitHub'
versions: Query and Install Specific Versions of Packages on CRAN
devtools: Tools to Make Developing R Packages Easier
Suggested Best Practices
For specific projects/research focuses, create specific libraries and or conda environments (with everything installed within that conda environment) to localize used packages/versions.
Regularly track/update what packages you’re using (install.packages / conda install r-<package-name>) and their versions.
Be mindful of dependencies that a package additional installs.
Be mindful when prompted whether you want to update dependencies or not.
Avoid trying to have a behemoth of a single environment - consider have a number of independent environments/libraries that can be more easily managed and shared along a workflow/pipeline.