PathoFact

Overview

  • PathoFact: PathoFact is an easy-to-use modular pipeline for the metagenomic analyses of toxins, virulence factors and antimicrobial resistance. Additionally, PathoFact combines the prediction of these pathogenic factors with the identification of mobile genetic elements. This provides further depth to the analysis by considering the localization of the genes on mobile genetic elements (MGEs), as well as on the chromosome. Furthermore, each module (toxins, virulence factors, and antimicrobial resistance) of PathoFact is also a standalone component, making it a flexible and versatile tool.

Using

PathoFact is an interesting application in that it requires a core conda environment, that provides a version of snakemake. The activated environment, and snakemake, are then used to run a workflow which itself creates a series of additional (cached) conda environments. Due to not knowing how a user might use/update/create these workflows, trying to create a module for this will not be straight forward. Instead, we will outline a general process that users can follow to setup and use themselves.

Some of the steps within the workflows require the use of SignalP. Since we have a number of versions, it is up to the user to select the version they wish to use, and setup the workflow configuration accordingly.

The conda environments created during the running of the workflow can be found in:

# Location PathoFact cloned into. .snakemake/conda

Setup PathoFact

# Choose a folder location to install into, and run from: []$ pwd /apps/u/opt/conda-envs/pathofact/1.0 []$ git clone -b master --recursive https://git-r3lab.uni.lu/laura.denies/PathoFact.git Cloning into 'PathoFact'... remote: Enumerating objects: 1869, done. remote: Counting objects: 100% (9/9), done. remote: Compressing objects: 100% (8/8), done. remote: Total 1869 (delta 1), reused 9 (delta 1), pack-reused 1860 Receiving objects: 100% (1869/1869), 5.49 GiB | 22.25 MiB/s, done. Resolving deltas: 100% (992/992), done. Updating files: 100% (193/193), done. Submodule 'submodules/DeepVirFinder' (https://github.com/jessieren/DeepVirFinder.git) registered for path 'submodules/DeepVirFinder' Cloning into '/apps/u/opt/conda-envs/pathofact/1.0/PathoFact/submodules/DeepVirFinder'... remote: Enumerating objects: 158, done. remote: Counting objects: 100% (47/47), done. remote: Compressing objects: 100% (8/8), done. remote: Total 158 (delta 42), reused 39 (delta 39), pack-reused 111 Receiving objects: 100% (158/158), 51.83 MiB | 26.99 MiB/s, done. Resolving deltas: 100% (50/50), done. Submodule path 'submodules/DeepVirFinder': checked out 'ddb4a9433132febe5cda39548cb9332669e11427' [powersw@blog1 1.0]$ ls PathoFact []$ mkdir conda-env []$ cd conda-env/ # NOTE: Using miniforge (not miniconda) [conda-env]$ module load miniforge/23.11.0 # NOTE: # The '-p' option creates the conda environment within your current working directory. # Make a note of the path to use to activate the environment. [conda-env]$ conda env create -p 1.0.0 --file=../PathoFact/envs/PathoFact.yaml Retrieving notices: ...working... done ... Collecting package metadata (repodata.json): done Solving environment: done Downloading and Extracting Packages: Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate /apps/u/opt/conda-envs/pathofact/1.0/conda-env/1.0.0 # # To deactivate an active environment, use # # $ conda deactivate

Run a Workflow

This outlines the steps taken running the provided test.

# This example creates an interactive session - do NOT run on the login nodes. # Navigate to working folder [PathoFact]$ salloc -A arcc -t 2:00:00 -c 32 --mem=64G salloc: Granted job allocation 12990352 salloc: Nodes t375 are ready for job # Note: # Reload the module after the interactive session has been allocated. # Using miniforge NOT miniconda. [@t375 PathoFact]$ module load miniforge/23.11.0 # Choose a version of SignalP [powersw@t375 PathoFact]$ module spider signalp ------------------------------------------- signalp: ------------------------------------------- Versions: signalp/6.0g-cpu signalp/6.0g-gpu [powersw@t375 PathoFact]$ module load signalp/6.0g-cpu [powersw@t375 PathoFact]$ which signalp /apps/u/opt/conda-envs/signalp6/6.0g-cpu/bin/signalp # Update the PathoFact test configuration: # Set the path for the version of SignalP to be used. @ Also note that there are various 'mem' options. [powersw@t375 PathoFact]$ vim test/test_config.yaml ... signalp: "/apps/u/opt/conda-envs/signalp6/6.0g-cpu/bin/" # User input deepvirfinder: "submodules/DeepVirFinder/dvf.py" ... mem: normal_mem_per_core_gb: "4G" big_mem_cores: 4 big_mem_per_core_gb: "30G" # Activate the conda environment. # Note how the command line prompt has changed. [powersw@t375 PathoFact]$ conda activate /apps/u/opt/conda-envs/pathofact/1.0/conda-env/1.0.0 # Simple check that snamemane is available via the conda environment. (/apps/u/opt/conda-envs/pathofact/1.0/conda-env/1.0.0) [powersw@t375 PathoFact]$ snakemake --version 5.5.4 # Run the test/workflow. (/apps/u/opt/conda-envs/pathofact/1.0/conda-env/1.0.0) [powersw@t375 PathoFact]$ snakemake -s test/Snakefile --use-conda --reason --cores 32 -p Building DAG of jobs... Executing subworkflow pathofact. Building DAG of jobs... Creating conda environment envs/R.yaml... Downloading remote packages. Environment for envs/R.yaml created (location: .snakemake/conda/7b98ddeb) Creating conda environment envs/VirSorter.yaml... Downloading remote packages. Environment for envs/VirSorter.yaml created (location: .snakemake/conda/a1061a41) Creating conda environment envs/Prodigal.yaml... Downloading remote packages. Environment for envs/Prodigal.yaml created (location: .snakemake/conda/72cd608e) Creating conda environment envs/Biopython.yaml... Downloading remote packages. Environment for envs/Biopython.yaml created (location: .snakemake/conda/7aae7c53) Using shell: /usr/bin/bash Provided cores: 32 Rules claiming more threads will be scaled down. ...

Multicore

The pathofact application can use multiple cores.