OrthoFinder is a fast, accurate and comprehensive analysis tool for comparative genomics. It finds orthologues and orthogroups infers rooted gene trees for all orthogroups and infers a rooted species tree for the species being analysed. OrthoFinder also provides comprehensive statistics for comparative genomic analyses. OrthoFinder is simple to use and all you need to run it is a set of protein sequence files (one per species) in FASTA format.
Using
Use the module name orthofinder to discover versions available and to load the application.
Example
[]$ orthofinder --help
OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms
SIMPLE USAGE:
Run full OrthoFinder analysis on FASTA format proteomes in <dir>
orthofinder [options] -f <dir>
Add new species in <dir1> to previous run in <dir2> and run new analysis
orthofinder [options] -f <dir1> -b <dir2>
OPTIONS:
-t <int> Number of parallel sequence search threads [Default = 4]
-a <int> Number of parallel analysis threads
-d Input is DNA sequences
-M <txt> Method for gene tree inference. Options 'dendroblast' & 'msa'
[Default = dendroblast]
-S <txt> Sequence search program [Default = diamond]
Options: blast, diamond, diamond_ultra_sens, blast_gz, mmseqs, blast_nucl
-A <txt> MSA program, requires '-M msa' [Default = mafft]
Options: mafft, muscle
-T <txt> Tree inference method, requires '-M msa' [Default = fasttree]
Options: fasttree, raxml, raxml-ng, iqtree
-s <file> User-specified rooted species tree
-I <int> MCL inflation parameter [Default = 1.5]
-x <file> Info for outputting results in OrthoXML format
-p <dir> Write the temporary pickle files to <dir>
-1 Only perform one-way sequence search
-X Don't add species names to sequence IDs
-y Split paralogous clades below root of a HOG into separate HOGs
-z Don't trim MSAs (columns>=90% gap, min. alignment length 500)
-n <txt> Name to append to the results directory
-o <txt> Non-default results directory
-h Print this help text
WORKFLOW STOPPING OPTIONS:
-op Stop after preparing input files for BLAST
-og Stop after inferring orthogroups
-os Stop after writing sequence files for orthogroups
(requires '-M msa')
-oa Stop after inferring alignments for orthogroups
(requires '-M msa')
-ot Stop after inferring gene trees for orthogroups
WORKFLOW RESTART COMMANDS:
-b <dir> Start OrthoFinder from pre-computed BLAST results in <dir>
-fg <dir> Start OrthoFinder from pre-computed orthogroups in <dir>
-ft <dir> Start OrthoFinder from pre-computed gene trees in <dir>
LICENSE:
Distributed under the GNU General Public License (GPLv3). See License.md
CITATION:
When publishing work that uses OrthoFinder please cite:
Emms D.M. & Kelly S. (2019), Genome Biology 20:238
If you use the species tree in your work then please also cite:
Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278
Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914
Using Multiple Cores/Threads
The orthofinder application allows you to define multiple cores/threads within calls using the '-t for number of threads allocated to the program' option.
After downloading the example files, a batch script would take the following form where we are defining and using 8 cores (on a single node).