...
Expand |
---|
|
Code Block |
---|
[@blog1 ~]$ qiime alignment --help
Usage: qiime alignment [OPTIONS] COMMAND [ARGS]...
Description: This QIIME 2 plugin provides support for generating and
manipulating sequence alignments.
Plugin website: https://github.com/qiime2/q2-alignment
Getting user support: Please post to the QIIME 2 forum for help with this
plugin: https://forum.qiime2.org
Options:
--version Show the version and exit.
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--help Show this message and exit.
Commands:
mafft De novo multiple sequence alignment with MAFFT
mafft-add Add sequences to multiple sequence alignment with MAFFT.
mask Positional conservation and gap filtering.
# Overview of 'fragment-insertion' plugin
[@blog1 ~]$ qiime fragment-insertion --help
Usage: qiime fragment-insertion [OPTIONS] COMMAND [ARGS]...
Description: No description available. See plugin website:
https://github.com/qiime2/q2-fragment-insertion
Plugin website: https://github.com/qiime2/q2-fragment-insertion
Getting user support: https://github.com/qiime2/q2-fragment-insertion/issues
Options:
--version Show the version and exit.
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--help Show this message and exit.
Commands:
classify-otus-experimental Experimental: Obtain taxonomic lineages, by
finding closest OTU in reference phylogeny.
filter-features Filter fragments in tree from table.
sepp Insert fragment sequences using SEPP into
reference phylogenies.
# Overview of the 'fragment-insertion' specific 'sepp' command.
# Notice the '--p-threads' option for multiple thread/core usage.
[@blog1 ~]$ qiime fragment-insertion sepp --help
Usage: qiime fragment-insertion sepp [OPTIONS]
Perform fragment insertion of sequences using the SEPP algorithm.
Inputs:
--i-representative-sequences ARTIFACT FeatureData[Sequence]
The sequences to insert into the reference tree.
[required]
--i-reference-database ARTIFACT SeppReferenceDatabase
The reference database to insert the representative
sequences into. [required]
Parameters:
--p-alignment-subset-size INTEGER
Each placement subset is further broken into subsets
of at most these many sequences and a separate HMM is
trained on each subset. [default: 1000]
--p-placement-subset-size INTEGER
The tree is divided into subsets such that each subset
includes at most these many subsets. The placement step
places the fragment on only one subset, determined
based on alignment scores. Further reading:
https://github.com/smirarab/sepp/blob/master/tutorial/s
epp-tutorial.md#sample-datasets-default-parameters.
[default: 5000]
--p-threads INTEGER The number of threads to use. [default: 1]
--p-debug / --p-no-debug
Collect additional run information to STDOUT for
debugging. Temporary directories will not be removed if
run fails. [default: False]
Outputs:
--o-tree ARTIFACT The tree with inserted feature data.
Phylogeny[Rooted] [required]
--o-placements ARTIFACT
Placements Information about the feature placements within the
reference tree. [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr during
execution of this action. Or silence output if
execution is successful (silence is golden).
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--help Show this message and exit. |
|
How Many Cores and/or Memory Should I Request?
There are no hard and fast rules on how to configure your batch files as in most cases it will depend on the size of your data and extent of analysis.
You will need to read and understand how to use the plugin/command as they can vary.
Memory is still probably going to be a major factor in how many cpus-per-task
you choose.
In the example above we were only able to use 32 cores because we ran the job on one of the teton-hugemem
partition nodes. Using a standard Teton node we were only able to use 2 cores. The latter still gave us an improvement of running for 9 hours and 45 minutes, compared to 17 hours with only a single core. But, using 32 cores on a hugemem node, the job ran in 30 minutes!
Remember, hugemem
nodes can be popular, so you might actually end up queuing for days to run a job in half an hour when you could have jumped on a Teton node immediately and already have the longer running job finished.
Depending on the size of data/analysis you might be able to use more cores on a Teton node.
Note |
---|
You will need to perform/track analysis to understand what works for your data/analysis. Do not just use a hugemem node! |
Data Resources
On the WildIris cluster we have downloaded the data resources from: https://docs.qiime2.org/2022.8/data-resources/# and https://docs.qiime2.org/2023.2/tutorials/feature-classifier/
...