Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

Overview

Maker: MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases.

Using

Use the module name maker to discover versions available and to load the application.

From what we have observed and learnt from existing users, maker is typically used along side augustus and exonerate.

Beartooth Module Setup:

As part of the installation of version 3.01.04 on Beartooth, when you module load make/3.01.04 it will automatically load the following:

snap-korf/2021-11-04
repeatmasker/4.1.3-ompi
exonerate/2.4.0
perl-bioperl/1.7.6
blast-plus/2.13.0

Teton Modules

They current latest versions of which can all be loaded using:

module load gcc/7.3.0 maker/2.31.10 augustus/3.3.2-py27 exonerate/2.4.0-py27

Please use the module spider command to look for alternatives.


Control Files

When setting up a new maker environment, a user will typically run make -CTL to generate the core three control files:

[]$ maker -CTL
maker_bopts.ctl  maker_exe.ctl  maker_opts.ctl

maker_exe.ctl

The maker_exe.ctl requires you the user to update a number of paths to point explicitly to required applications:

There are versions of the blastn, blastx, tblastx and RepeatMasker applications that come packaged with maker, and are already defined. You will need to explicitly enter the paths for exonerate and augustus, and if you’re using the versions of the modules above, then the paths to these versions are demonstrated below.

Beartooth Example:

Running maker -CTL will generate a file with the following paths:

cat maker_exe.ctl
#-----Location of Executables Used by MAKER/EVALUATOR
makeblastdb=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/makeblastdb #location of NCBI+ makeblastdb executable
blastn=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/blastn #location of NCBI+ blastn executable
blastx=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/blastx #location of NCBI+ blastx executable
tblastx=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/tblastx #location of NCBI+ tblastx executable
formatdb= #location of NCBI formatdb executable
blastall= #location of NCBI blastall executable
xdformat= #location of WUBLAST xdformat executable
blasta= #location of WUBLAST blasta executable
prerapsearch= #location of prerapsearch executable
rapsearch= #location of rapsearch executable
RepeatMasker=/apps/u/spack/gcc/12.2.0/repeatmasker/4.1.2-p1-vwpy6c4/bin/RepeatMasker #location of RepeatMasker executable
exonerate=/apps/u/spack/gcc/12.2.0/exonerate/2.4.0-4d3tjyb/bin/exonerate #location of exonerate executable

#-----Ab-initio Gene Prediction Algorithms
snap=/apps/u/spack/gcc/12.2.0/snap-korf/2021-11-04-3azfcio/bin/snap #location of snap executable
gmhmme3= #location of eukaryotic genemark executable
gmhmmp= #location of prokaryotic genemark executable
augustus= #location of augustus executable
fgenesh= #location of fgenesh executable
evm= #location of EvidenceModeler executable
tRNAscan-SE= #location of trnascan executable
snoscan= #location of snoscan executable

#-----Other Algorithms
probuild= #location of probuild executable (required for genemark)

If you require augustus then load it as a module.

Teton Example:
[]$ cat maker_exe.ctl
#-----Location of Executables Used by MAKER/EVALUATOR
makeblastdb=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/makeblastdb #location of NCBI+ makeblastdb executable
blastn=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/blastn #location of NCBI+ blastn executable
blastx=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/blastx #location of NCBI+ blastx executable
tblastx=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/tblastx #location of NCBI+ tblastx executable
formatdb= #location of NCBI formatdb executable
blastall= #location of NCBI blastall executable
xdformat= #location of WUBLAST xdformat executable
blasta= #location of WUBLAST blasta executable
RepeatMasker=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/RepeatMasker/RepeatMasker #location of RepeatMasker executable
exonerate=/apps/u/gcc/7.3.0/exonerate/2.4.0-3bglywa/bin/exonerate #location of exonerate executable

#-----Ab-initio Gene Prediction Algorithms
snap= #location of snap executable
gmhmme3= #location of eukaryotic genemark executable
gmhmmp= #location of prokaryotic genemark executable
augustus=/apps/u/gcc/7.3.0/augustus/3.3.2-etubcuo/bin/augustus #location of augustus executable
fgenesh= #location of fgenesh executable
tRNAscan-SE= #location of trnascan executable
snoscan= #location of snoscan executable

#-----Other Algorithms
probuild= #location of probuild executable (required for genemark)

maker_opts.ctl

On Beartooth

We have observed a problem with the model_org=all option, related to RepeatMasker, of the form:

running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_4xxbhq; /apps/u/spack/gcc/12.2.0/repeatmasker/4.1.2-p1-vwpy6c4/bin/RepeatMasker /pfs/tc1/project/arcc/software/maker/testing/maker_tutorial/example_01_basic/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/0/contig-dpp-500-500.0.all.rb -species all -dir /pfs/tc1/project/arcc/software/maker/testing/maker_tutorial/example_01_basic/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/0 -pa 1
#-------------------------------#

Species "all" is not known to RepeatMasker.  There may
not be any TE families defined in the libraries for this
species/clade or there may be an error in the spelling.
Please check your entry against the NCBI Taxonomy database
and/or try using a broader clade or related species instead.
The full list of species/clades defined in the library may be
obtained using the famdb.py script.

We have worked around this issue by setting the option to: model_org to simple or leaving it blank.

Under External Application Behaviour Options, you have the following multicore option:

cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

Teton Script Templates

What follows are two example scripts, one for a single node/single cpu, and another for multiple nodes using multiple tasks.

Single Node

If you are running a small data set, and only require a single node/core, then the following template provides an example:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
...
module load gcc/7.3.0 maker/2.31.10 augustus/3.3.2-py27 exonerate/2.4.0-py27
maker

Multiple Nodes

For much larger data sets, Maker can be ran across multiple nodes and tasks using something like the following:

#!/bin/bash
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
...
# Modules to Load
module load gcc/7.3.0 maker/2.31.10 augustus/3.3.2-py27 exonerate/2.4.0-py27
mpiexec -n 32 maker

Notice that the line to start maker has changed, with the value after the -n option equaling the number of nodes multiplied by the number of tasks (4 x 8 = 32).

Memory Issues

Memory is always an issue with any form of bioinformatics analysis, and there are no straight forward recommendations we can make. As a researcher you’ll need to track the size of your data sets, the type of analysis, and the resources you’ve requested and how efficiently they’ve been used.

One indicator that you have not allocated enough memory is if you see an error of the following form:

ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:tig00000510

Please refer to our Slurm page on requesting and using memory: Introduction to Job Submission: 02: Memory and GPUs

  • No labels