Qiime2

Overview

Quantitative Insights Into Microbial Ecology (QIIME) is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication-quality graphics and statistics. This includes demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations. QIIME has been applied to studies based on billions of sequences from tens of thousands of samples.

QIIME Features:

Automatically track your analyses with decentralized data provenance — no more guesswork on what commands were run!
Interactively explore your data with beautiful visualizations that provide new perspectives.
Easily share results with your team, even those members without QIIME 2 installed.
Plugin-based system — your favorite microbiome methods all in one place.

Notes:

ARCC currently does not monitor the current/latest versions of the software. If you require an update to a version please remember to put a request in and include any plugins you require.

Qiime2 does have a collection of plugins. The ones available can be seen by typing qiime from the command line once the module has been loaded, as shown below. If you require a plugin not listed please put a request into ARCC and we can explore how best to make this available.
We are still learning to what extent qiime2 is parallelized. At this moment we believe that it can only run on a single node. Some plugins can make use of multiple cores on that node, which can be found by reading the documentation relating to that plugin. Since there is no consistent syntax across the plugins on how to make use of this, if you can not work it out yourself please contact ARCC and we'll be happy to help.
If the usage of the software increases and the demand warrants the managing of the central reference database then ARCC is happy to discuss and explore.

Note on Plugins and Versions

You will notice that the available plugins are not all available in every qiime2 version. Plugins are open source, and developed by third-parties and thus not necessarily regularly updated when Qiime2 is updated. ARCC will try and see if a plugin will install when we update Qiime2, but in a lot of cases we are running into dependency issues where a plugin has been developed using older versions of libraries than that used by the latest Qiime version. Until the plugin developer releases a new version compatible with the latest version of Qiime there is nothing we can do.

Using

Use the module name qiime2 to discover versions available and to load the application.

Multicore

Out of the box, qiime2 does not automatically run in parallel, but some of the plugins/commands can be configured to use multiple cores.
One example is classify-sklearn which is a pre-fitted sklearn-based taxonomy classifier. This command has the --p-n-jobs option that allows multiple cores to be used.

Getting Help

There are a number of commands to get help from the command-line:

General Help

[@blog1 ~]$ module load qiime2/2023.5
[@blog1 ~]$ qiime
Usage: qiime [OPTIONS] COMMAND [ARGS]...

  QIIME 2 command-line interface (q2cli)
  --------------------------------------

  To get help with QIIME 2, visit https://qiime2.org.

  To enable tab completion in Bash, run the following command or add it to
  your .bashrc/.bash_profile:

      source tab-qiime

  To enable tab completion in ZSH, run the following commands or add them to
  your .zshrc:

      autoload -Uz compinit && compinit
      autoload bashcompinit && bashcompinit
      source tab-qiime

Options:
  --version   Show the version and exit.
  --help      Show this message and exit.

Commands:
  info                Display information about current deployment.
  tools               Tools for working with QIIME 2 files.
  dev                 Utilities for developers and advanced users.
  alignment           Plugin for generating and manipulating alignments.
  composition         Plugin for compositional data analysis.
  cutadapt            Plugin for removing adapter sequences, primers, and
                      other unwanted sequence from sequence data.
  dada2               Plugin for sequence quality control with DADA2.
  deblur              Plugin for sequence quality control with Deblur.
  demux               Plugin for demultiplexing & viewing sequence quality.
  diversity           Plugin for exploring community diversity.
  diversity-lib       Plugin for computing community diversity.
  emperor             Plugin for ordination plotting with Emperor.
  feature-classifier  Plugin for taxonomic classification.
  feature-table       Plugin for working with sample by feature tables.
  fragment-insertion  Plugin for extending phylogenies.
  gneiss              Plugin for building compositional models.
  longitudinal        Plugin for paired sample and time series analyses.
  metadata            Plugin for working with Metadata.
  phylogeny           Plugin for generating and manipulating phylogenies.
  quality-control     Plugin for quality control of feature and sequence data.
  quality-filter      Plugin for PHRED-based filtering and trimming.
  sample-classifier   Plugin for machine learning prediction of sample
                      metadata.
  taxa                Plugin for working with feature taxonomy annotations.
  vsearch             Plugin for clustering and dereplicating with vsearch.

qiime info - includes installed plugins

[@blog1 ~]$ qiime info
System versions
Python version: 3.8.16
QIIME 2 release: 2023.5
QIIME 2 version: 2023.5.0
q2cli version: 2023.5.0

Installed plugins
alignment: 2023.5.0
composition: 2023.5.0
cutadapt: 2023.5.0
dada2: 2023.5.0
deblur: 2023.5.0
demux: 2023.5.0
diversity: 2023.5.0
diversity-lib: 2023.5.0
emperor: 2023.5.0
feature-classifier: 2023.5.0
feature-table: 2023.5.0
fragment-insertion: 2023.5.0
gneiss: 2023.5.0
longitudinal: 2023.5.0
metadata: 2023.5.0
phylogeny: 2023.5.0
quality-control: 2023.5.0
quality-filter: 2023.5.0
sample-classifier: 2023.5.0
taxa: 2023.5.0
types: 2023.5.0
vsearch: 2023.5.0

Application config directory
/home/salexan5/.config/q2cli

Getting help
To get help with QIIME 2, visit https://qiime2.org

qiime <plugin> help

[@blog1 ~]$ qiime alignment --help
Usage: qiime alignment [OPTIONS] COMMAND [ARGS]...

  Description: This QIIME 2 plugin provides support for generating and
  manipulating sequence alignments.

  Plugin website: https://github.com/qiime2/q2-alignment

  Getting user support: Please post to the QIIME 2 forum for help with this
  plugin: https://forum.qiime2.org

Options:
  --version            Show the version and exit.
  --example-data PATH  Write example data and exit.
  --citations          Show citations and exit.
  --help               Show this message and exit.

Commands:
  mafft      De novo multiple sequence alignment with MAFFT
  mafft-add  Add sequences to multiple sequence alignment with MAFFT.
  mask       Positional conservation and gap filtering.

  
# Overview of 'fragment-insertion' plugin
[@blog1 ~]$ qiime fragment-insertion --help
Usage: qiime fragment-insertion [OPTIONS] COMMAND [ARGS]...

  Description: No description available. See plugin website:
  https://github.com/qiime2/q2-fragment-insertion

  Plugin website: https://github.com/qiime2/q2-fragment-insertion

  Getting user support: https://github.com/qiime2/q2-fragment-insertion/issues

Options:
  --version            Show the version and exit.
  --example-data PATH  Write example data and exit.
  --citations          Show citations and exit.
  --help               Show this message and exit.

Commands:
  classify-otus-experimental  Experimental: Obtain taxonomic lineages, by
                              finding closest OTU in reference phylogeny.
  filter-features             Filter fragments in tree from table.
  sepp                        Insert fragment sequences using SEPP into
                              reference phylogenies.


# Overview of the 'fragment-insertion' specific 'sepp' command.
# Notice the '--p-threads' option for multiple thread/core usage.
[@blog1 ~]$ qiime fragment-insertion sepp --help
Usage: qiime fragment-insertion sepp [OPTIONS]

  Perform fragment insertion of sequences using the SEPP algorithm.

Inputs:
  --i-representative-sequences ARTIFACT FeatureData[Sequence]
                       The sequences to insert into the reference tree.
                                                                    [required]
  --i-reference-database ARTIFACT SeppReferenceDatabase
                       The reference database to insert the representative
                       sequences into.                              [required]
Parameters:
  --p-alignment-subset-size INTEGER
                       Each placement subset is further broken into subsets
                       of at most these many sequences and a separate HMM is
                       trained on each subset.                 [default: 1000]
  --p-placement-subset-size INTEGER
                       The tree is divided into subsets such that each subset
                       includes at most these many subsets. The placement step
                       places the fragment on only one subset, determined
                       based on alignment scores. Further reading:
                       https://github.com/smirarab/sepp/blob/master/tutorial/s
                       epp-tutorial.md#sample-datasets-default-parameters.
                                                               [default: 5000]
  --p-threads INTEGER  The number of threads to use.              [default: 1]
  --p-debug / --p-no-debug
                       Collect additional run information to STDOUT for
                       debugging. Temporary directories will not be removed if
                       run fails.                             [default: False]
Outputs:
  --o-tree ARTIFACT    The tree with inserted feature data.
    Phylogeny[Rooted]                                               [required]
  --o-placements ARTIFACT
    Placements         Information about the feature placements within the
                       reference tree.                              [required]
Miscellaneous:
  --output-dir PATH    Output unspecified results to a directory
  --verbose / --quiet  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is successful (silence is golden).
  --example-data PATH  Write example data and exit.
  --citations          Show citations and exit.
  --help               Show this message and exit.

How Many Cores and/or Memory Should I Request?

There are no hard and fast rules on how to configure your batch files as in most cases it will depend on the size of your data and extent of analysis.
You will need to read and understand how to use the plugin/command as they can vary.
Memory is still probably going to be a major factor in how many cpus-per-task you choose.
In the example above we were only able to use 32 cores because we ran the job on one of the teton-hugemem partition nodes. Using a standard Teton node we were only able to use 2 cores. The latter still gave us an improvement of running for 9 hours and 45 minutes, compared to 17 hours with only a single core. But, using 32 cores on a hugemem node, the job ran in 30 minutes!
- Remember, hugemem nodes can be popular, so you might actually end up queuing for days to run a job in half an hour when you could have jumped on a Teton node immediately and already have the longer running job finished.
- Depending on the size of data/analysis you might be able to use more cores on a Teton node.

You will need to perform/track analysis to understand what works for your data/analysis. Do not just use a hugemem node!

Data Resources

On the WildIris cluster we have downloaded the data resources from: https://docs.qiime2.org/2022.8/data-resources/# and https://docs.qiime2.org/2023.2/tutorials/feature-classifier/

When you load the qiime2/<version> module, it sets a QIIME_DATA_RESOURCES environment variable, that can be used to view/access the data resources.

Example use of QIIME_DATA_RESOURCES environment variable:

[salexan5@wilog01 ~]$ module load qiime2/2023.5
[salexan5@wilog01 ~]$ echo $QIIME_DATA_RESOURCES
/apps/u/opt/qiime2/data

[salexan5@wilog01 ~]$ ls $QIIME_DATA_RESOURCES
85_otus.fasta                                  gg-13-8-99-nb-weighted-classifier.qza  silva-138-99-515-806-nb-classifier.qza
85_otus.qza                                    gg_13_8_otus.tar.gz                    silva-138-99-nb-classifier.qza
85_otu_taxonomy.txt                            gg_otus_4feb2011.tgz                   silva-138-99-nb-weighted-classifier.qza
classifier.qza                                 ref-seqs.qza                           silva-138-99-seqs-515-806.qza
gg_12_10_otus.tar.gz                           ref-taxonomy.qza                       silva-138-99-seqs.qza
gg_13_5_otus.tar.gz                            rep-seqs.qza                           silva-138-99-tax-515-806.qza
gg-13-8-99-515-806-nb-classifier.qza           rep-seqs.qza.1                         silva-138-99-tax.qza
gg-13-8-99-515-806-nb-weighted-classifier.qza  sepp-refs-gg-13-8.qza                  taxonomy.qza
gg-13-8-99-nb-classifier.qza                   sepp-refs-silva-128.qza                taxonomy.qzv

[salexan5@wilog01 ~]$ cp $QIIME_DATA_RESOURCES/85_otus.fasta .
[salexan5@wilog01 ~]$ ls
85_otus.fasta

Issues

Bus Error

If within a job a step fails with a bus error, then our first suggestion will be to look at the amount of memory you have/haven’t allocated for you job. From working with users, the majority of the time it is caused by not requested enough memory for the size of data you’re trying to analyze.

Our wiki page here Introduction to Job Submission: 02: Memory and GPUs provides an introduction on how to define memory resources within a slurm job