Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Overview

  • RoseTTAFold: This package contains deep learning models and related scripts to run RoseTTAFold. This repository is the official implementation of RoseTTAFold: Accurate prediction of protein structures and interactions using a 3-track network.

    • GitHub:

    • PyRosetta: PyRosetta is an interactive Python-based interface to the powerful Rosetta molecular modeling suite. It enables users to design their own custom molecular modeling algorithms using Rosetta sampling methods and energy functions.

Using

The RoseTTAFold environment is a combination of conda environments, python packages, scripts, commands and data.

Although the RoseTTAFold environment can be used across a cluster, it is not currently designed to straight forwardly run on a cluster out of the box.

This page will suggest how to use locally and what ARCC infrastructure has provided to make using the provided pipelines more convenient.

Getting Started

Within your home or project folder, you will need to clone the main git repository and install the csblast and lddt dependencies:

# Clone python related scripts.
[@blog2 testing]$ git clone https://github.com/RosettaCommons/RoseTTAFold.git
[@blog2 testing]$ cd RoseTTAFold/
# Install csblast and lddt applications.
[@blog2 RoseTTAFold]$ ./install_dependencies.sh

Why Locally? We have noticed during testing, that some of the provided scripts appear to require access and write to some internal child folders. This could potentially affect multiple users running concurrently from the same RoseTTAFold folder.

Module and sequence and structure database data

As details on the main RoseTTAFold GitHub page, you can download sequence and structure database data. For convenience we have downloaded this data that currently is >2.2T.

To expose this central location, and to allow future updates, use the module name rosettafold to discover the versions available. This will setup your environment with the following ROSETTA_DATA environment variable which can be used to access this pre-download data - for example:

 data files available:
[salexan5@ttest01 rosettafold]$ module load rosettafold/1.1.0
[salexan5@ttest01 rosettafold]$ ls -R $ROSETTA_DATA
/pfs/tc1/udata/rosettafold/:
bfd  pdb100_2021Mar03  UniRef30_2020_06  weights

/pfs/tc1/udata/rosettafold/bfd:
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata    bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex   bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata  bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex

/pfs/tc1/udata/rosettafold/pdb100_2021Mar03:
LICENSE                      pdb100_2021Mar03_a3m.ffindex   pdb100_2021Mar03_cs219.ffindex  pdb100_2021Mar03_hhm.ffindex  pdb100_2021Mar03_pdb.ffindex
pdb100_2021Mar03_a3m.ffdata  pdb100_2021Mar03_cs219.ffdata  pdb100_2021Mar03_hhm.ffdata     pdb100_2021Mar03_pdb.ffdata

/pfs/tc1/udata/rosettafold/UniRef30_2020_06:
UniRef30_2020_06_a3m.ffdata   UniRef30_2020_06_cs219.ffdata   UniRef30_2020_06_hhm.ffdata   UniRef30_2020_06.md5sums
UniRef30_2020_06_a3m.ffindex  UniRef30_2020_06_cs219.ffindex  UniRef30_2020_06_hhm.ffindex

/pfs/tc1/udata/rosettafold/weights:
RF2t.pt  Rosetta-DL_LICENSE.txt  RoseTTAFold_e2e.pt  RoseTTAFold_pyrosetta.pt

Conda Environments

The RoseTTAFold pipeline appears to be based upon two conda environments that ARCC infrastructure has pre-build.

To use, you will first need to module load miniconda3/23.1.0 and then activate either/or:

conda activate /apps/u/opt/conda-envs/rosettafold/1.1.0/RoseTTAFold
 Available RoseTTAFold Related Commands
2to3                  chkparse                       ffindex_unpack    infocmp           nettle-hash         psicc             tabs        webpmux
2to3-3.8              cjpeg                          ffmpeg            infotocap         nettle-lfib-stream  psipass2          tclsh       wheel
a3m_database_extract  clear                          ffprobe           instmodsh         nettle-pbkdf2       psipred           tclsh8.6    wish
a3m_database_filter   community                      formatdb          jpegtran          ninja               psktool           tic         wish8.6
a3m_database_reduce   convert-caffe2-to-onnx         formatrpsdb       jpgicc            ocsptool            ptar              tiff2bw     wrjpgcom
a3m_extract           convert-onnx-to-caffe2         freetype-config   json_pp           openssl             ptardiff          tiff2pdf    x86_64-conda_cos6-linux-gnu-ld
a3m_reduce            copymat                        gnutls-cli        lame              pal2rgb             ptargrep          tiff2ps     x86_64-conda-linux-gnu-ld
asn1Coding            corelist                       gnutls-cli-debug  libnetcfg         perl                pydoc             tiff2rgba   xsubpp
asn1Decoding          cpan                           gnutls-serv       libpng16-config   perl5.26.2          pydoc3            tiffcmp     xz
asn1Parser            c_rehash                       h264dec           libpng-config     perlbug             pydoc3.8          tiffcp      xzcat
bl2seq                cstranslate                    h264enc           linkicc           perldoc             python            tiffcrop    xzcmp
blastall              djpeg                          h2ph              lz4               perlivp             python3           tiffdither  xzdec
blastclust            enc2xs                         h2xs              lz4c              perlthanks          python3.8         tiffdump    xzdiff
blastpgp              encguess                       hhalign           lz4cat            piconv              python3.8-config  tiffinfo    xzegrep
bunzip2               f2py                           hhalign_omp       lzcat             pip                 python3-config    tiffmedian  xzfgrep
bzcat                 f2py3                          hhblits           lzcmp             pip3                raw2tiff          tiffset     xzgrep
bzcmp                 f2py3.8                        hhblits_ca3m      lzdiff            pkcs1-conv          rdjpgcom          tiffsplit   xzless
bzdiff                fastacmd                       hhblits_omp       lzegrep           pl2pm               reset             tificc      xzmore
bzegrep               fax2ps                         hhconsensus       lzfgrep           pngfix              rpsblast          toe         zipdetails
bzfgrep               fax2tiff                       hhfilter          lzgrep            png-fix-itxt        run_psipred.pl    tput        zstd
bzgrep                ffindex_apply                  hhmake            lzless            pod2html            seedtop           tqdm        zstdcat
bzip2                 ffindex_build                  hhsearch          lzma              pod2man             seq2mtx           transicc    zstdgrep
bzip2recover          ffindex_from_fasta             hhsearch_omp      lzmadec           pod2text            sexp-conv         tset        zstdless
bzless                ffindex_from_fasta_with_split  iconv             lzmainfo          pod2usage           shasum            unlz4       zstdmt
bzmore                ffindex_get                    idle3             lzmore            podchecker          splain            unlzma
captoinfo             ffindex_modify                 idle3.8           makemat           podselect           sqlite3           unxz
certtool              ffindex_order                  idn2              megablast         ppm2tiff            sqlite3_analyzer  unzstd
chardetect            ffindex_reduce                 impala            ncursesw6-config  prove               srptool           webpinfo
conda activate /apps/u/opt/conda-envs/rosettafold/1.1.0/Folding
 Available Folding Related Commands
2to3               env_parallel.ksh     h2ph              h5repack   lzless            perlivp     pydoc3.7           streamzip         x86_64-conda_cos7-linux-gnu-ld
2to3-3.7           env_parallel.mksh    h2xs              h5repart   lzma              perlthanks  python             tabs              x86_64-conda-linux-gnu-ld
acountry           env_parallel.pdksh   h52gif            h5stat     lzmadec           piconv      python3            tclsh             xsubpp
adig               env_parallel.sh      h5c++             h5unjam    lzmainfo          pip         python3.7          tclsh8.6          xz
ahost              env_parallel.tcsh    h5cc              h5watch    lzmore            pip3        python3.7-config   tensorboard       xzcat
captoinfo          env_parallel.zsh     h5clear           idle3      markdown_py       pl2pm       python3.7m         tflite_convert    xzcmp
clear              f2py                 h5copy            idle3.7    matplotlib        pod2html    python3.7m-config  tf_upgrade_v2     xzdec
corelist           f2py3                h5debug           infocmp    ncursesw6-config  pod2man     python3-config     tic               xzdiff
cpan               f2py3.7              h5diff            infotocap  niceload          pod2text    pyvenv             toco              xzegrep
c_rehash           fftwf-wisdom         h5dump            instmodsh  openssl           pod2usage   pyvenv-3.7         toco_from_protos  xzfgrep
enc2xs             fftwl-wisdom         h5fc              json_pp    parallel          podchecker  reset              toe               xzgrep
encguess           fftw-wisdom          h5format_convert  libnetcfg  parcat            protoc      saved_model_cli    tput              xzless
env_parallel       fftw-wisdom-to-conf  h5import          lzcat      parset            prove       sem                tset              xzmore
env_parallel.ash   freeze_graph         h5jam             lzcmp      parsort           ptar        shasum             unlzma            zipdetails
env_parallel.bash  gdbm_dump            h5ls              lzdiff     perl              ptardiff    splain             unxz
env_parallel.csh   gdbm_load            h5mkgrp           lzegrep    perl5.34.0        ptargrep    sql                wheel
env_parallel.dash  gdbmtool             h5perf_serial     lzfgrep    perlbug           pydoc       sqlite3            wish
env_parallel.fish  gif2h5               h5redeploy        lzgrep     perldoc           pydoc3      sqlite3_analyzer   wish8.6

Provided pipeline script updates

As detailed on the main RoseTTAFold page “The modeling pipeline provided here (run_pyrosetta_ver.sh/run_e2e_ver.sh) is a kind of guidelines to show how RoseTTAFold works.

If you wish to use the provided pipelines on the Beartooth cluster using the pre-built conda environments and centralized data you will need to modify the above and following scripts:

 input_prep/make_msa.sh
Line 14:
From:
# sequence databases
declare -a DATABASES=( \
    "$PIPEDIR/UniRef30_2020_06/UniRef30_2020_06" \
    "$PIPEDIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt")

To:
# sequence databases
declare -a DATABASES=( \
    "$ROSETTA_DATA/UniRef30_2020_06/UniRef30_2020_06" \
    "$ROSETTA_DATA/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt")
 run_e2e_ver_arcc.sh
Line 30
From: conda activate RoseTTAFold
To  : conda activate /apps/u/opt/conda-envs/rosettafold/1.1.0/RoseTTAFold

Line 54
From: DB="$PIPEDIR/pdb100_2021Mar03/pdb100_2021Mar03"
To  : DB="$ROSETTA_DATA/pdb100_2021Mar03/pdb100_2021Mar03"

Line 71:
From: -m $PIPEDIR/weights
To  : -m $ROSETTA_DATA/weights
 run_pyrosetta_ver_arcc.sh
Line 30
From: conda activate Folding
To  : conda activate /apps/u/opt/conda-envs/rosettafold/1.1.0/RoseTTAFold

Line 54
From: DB="$PIPEDIR/pdb100_2021Mar03/pdb100_2021Mar03"
To  : DB="$ROSETTA_DATA/pdb100_2021Mar03/pdb100_2021Mar03"

Line 71:
From: -m $PIPEDIR/weights
To  : -m $ROSETTA_DATA/weights

Line 85
From: conda activate folding
To  : conda activate /apps/u/opt/conda-envs/rosettafold/1.1.0/Folding

If you are creating your own pipelines, you will not being able to successfully run the provided python scripts without activating the associated conda environment.

Multicore

Since the modeling pipeline is made up of a series of commands, you’ll need to inspect each of these commands to understand their multicore capabilities.

As a starting point, take a look through the run_[e2e/pyrosetta]_ver_arcc.sh scripts, within which you'll see that the following variables are defined at the top:

CPU="8"  # number of CPUs to use
MEM="64" # max memory (in GB)

The are then past as arguments into the commands called within the script.

The values you use within your own scripts must match what you request via you salloc/sbatch calls.

Known Issue within miniconda3

There is a known issue loading miniconda3 and then creating an interactive salloc session:

 Known: undefined symbol: EVP_KDF_ctrl
[]$ module load miniconda3/23.1.0
[]$ salloc -A arcc -t 5:00
salloc: Granted job allocation 7174905
salloc: Waiting for resource configuration
salloc: Nodes ttest01 are ready for job
flatpak: symbol lookup error: /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b

To resolve this, call salloc first, and then perform the module load miniconda3:

  • No labels