R Environments and Reproducibility
Goal: Introduce ideas and practices to assist in managing the reproducibility of R environments.
What Packages Do I Have Installed?
First step is knowing what your environment is using, and where these packages are installed:
Remember to use .libPaths()
[]$ module load gcc/14.2.0 r/4.4.0
[]$ R
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
...
> .libPaths()
[1] "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4"
[2] "/apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library"
> quit()
[]$ ls /apps/u/spack/gcc/13.2.0/r/4.4.0-pvzi4gp/rlib/R/library
base compiler datasets graphics grDevices grid methods parallel splines stats stats4 tcltk tools translations utils
[]$ ls /cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4
class cli DBI e1071 generics KernSmooth magrittr pillar proxy R6 rlang sf stringr tidyr units vctrs wk
classInt cpp11 dplyr fansi glue lifecycle MASS pkgconfig purrr Rcpp s2 stringi tibble tidyselect utf8 withr XML
Anything ARCC has installed will not be updated. We will create a new version of the base R.
Track the R Packages and Versions you have Installed
How can I track the versions of R packages installed? Using plain R:
[salexan5@mblog1 ~]$ R
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
...
> write.table(installed.packages()[,c(1,2,3:4)])
"Package" "LibPath" "Version" "Priority"
"class" "class" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "7.3-22" "recommended"
...
"sf" "sf" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.0-16" NA
"stringi" "stringi" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.8.4" NA
"stringr" "stringr" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "1.5.1" NA
"tibble" "tibble" "/cluster/medbow/home/<username>/R/x86_64-pc-linux-gnu-library/4.4" "3.2.1" NA
...
"tools" "tools" "/apps/u/spack/gcc/14.2.0/r/4.4.0-w7xoohc/rlib/R/library" "4.4.0" "base"
"utils" "utils" "/apps/u/spack/gcc/14.2.0/r/4.4.0-w7xoohc/rlib/R/library" "4.4.0" "base"
Conda Export and R Packages
The conda list
command (within an activated Conda environment) will only list the packages you’ve installed using conda install
.
It does not track/list anything you’ve installed, from within R, using install.packages()
.
Using conda env export
/conda env create
create an incomplete environment.
[]$ module purge
[]$ module load miniconda3/24.3.0
conda activate /cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env
(/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env) [salexan5@mblog2 ~]$
(/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env) [salexan5@mblog2 ~]$ conda list
# packages in environment at /cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env:
#
# Name Version Build Channel
...
r-base 4.3.3 he2d9a6e_3 conda-forge
r-stringi 1.8.4 r43hbd1cc82_0 conda-forge
...
(/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env) []$ R
R version 4.3.3 (2024-02-29) -- "Angel Food Cake"
...
> .libPaths()
[1] "/cluster/medbow/project/<project-name>/software/conda-envs/r_4.3.3_env/lib/R/library"
> write.table(installed.packages()[,c(1,2,3:4)])
() []$ ls /project/<project-name>/software/conda-envs/r_4.3.3_env/lib/R/library/
base compiler digest fastmap grDevices IRdisplay lifecycle pbdZMQ rlang stats4 tools utils
base64enc crayon evaluate glue grid IRkernel methods pillar splines stringi translations uuid
cli datasets fansi graphics htmltools jsonlite parallel repr stats tcltk utf8 vctrs
Track the Building of Your Environments
You will need to use a combination of:
System:
module load r/<version>
R:
.libPaths()
R:
install.packages()
Conda:
conda list
/conda env export
/conda env create
To record and track how your environment is made up.
Be aware that updating a package might update all it’s dependencies.
The order you install packages might also make a difference.
Install R Packages with a Specific Version
R’s base install.packages()
only allows you to install a specific version of a package when you’ve downloaded the source.
The conda install
does allow you to define a specific version.
Suggested Best Practices
For specific projects/research focuses, create specific libraries and or conda environments (with everything installed within that conda environment) to localize used packages/versions.
Regularly track/update what packages you’re using (
install.packages
/conda install r-<package-name>
) and their versions.Be mindful of dependencies that a package additional installs.
Be mindful when prompted whether you want to update dependencies or not.
Avoid trying to have a behemoth of a single environment - consider have a number of independent environments/libraries that can be more easily managed and shared along a workflow/pipeline.