Overview
Maker: MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases.
This details the use of maker and related applications available on the cluster. Please note that this page is organic, and as we work with researcher to understand how best to functionally support maker on the cluster, it will likely change.
Tutorials:
Using
Use the module name maker
to discover versions available and to load the application.
From what we have observed and learnt from existing users, maker is typically used along side augustus
and exonerate
.
Beartooth Module Setup:
As part of the installation of version 3.01.04 on Beartooth, when you module load make/3.01.04 it will automatically load the following:
snap-korf/2021-11-04 repeatmasker/4.1.3-ompi exonerate/2.4.0 perl-bioperl/1.7.6 blast-plus/2.13.0
Teton Modules
They current latest versions of which can all be loaded using:
module load gcc/7.3.0 maker/2.31.10 augustus/3.3.2-py27 exonerate/2.4.0-py27
Please use the module spider
command to look for alternatives.
Control Files
When setting up a new maker environment, a user will typically run make -CTL
to generate the core three control files:
[]$ maker -CTL maker_bopts.ctl maker_exe.ctl maker_opts.ctl
maker_exe.ctl
The maker_exe.ctl
requires you the user to update a number of paths to point explicitly to required applications:
There are versions of the blastn
, blastx
, tblastx
and RepeatMasker
applications that come packaged with maker, and are already defined. You will need to explicitly enter the paths for exonerate
and augustus
, and if you’re using the versions of the modules above, then the paths to these versions are demonstrated below.
Beartooth Example:
Running maker -CTL
will generate a file with the following paths:
cat maker_exe.ctl #-----Location of Executables Used by MAKER/EVALUATOR makeblastdb=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/makeblastdb #location of NCBI+ makeblastdb executable blastn=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/blastn #location of NCBI+ blastn executable blastx=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/blastx #location of NCBI+ blastx executable tblastx=/apps/u/spack/gcc/12.2.0/blast-plus/2.13.0-5zhb232/bin/tblastx #location of NCBI+ tblastx executable formatdb= #location of NCBI formatdb executable blastall= #location of NCBI blastall executable xdformat= #location of WUBLAST xdformat executable blasta= #location of WUBLAST blasta executable prerapsearch= #location of prerapsearch executable rapsearch= #location of rapsearch executable RepeatMasker=/apps/u/spack/gcc/12.2.0/repeatmasker/4.1.2-p1-vwpy6c4/bin/RepeatMasker #location of RepeatMasker executable exonerate=/apps/u/spack/gcc/12.2.0/exonerate/2.4.0-4d3tjyb/bin/exonerate #location of exonerate executable #-----Ab-initio Gene Prediction Algorithms snap=/apps/u/spack/gcc/12.2.0/snap-korf/2021-11-04-3azfcio/bin/snap #location of snap executable gmhmme3= #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus= #location of augustus executable fgenesh= #location of fgenesh executable evm= #location of EvidenceModeler executable tRNAscan-SE= #location of trnascan executable snoscan= #location of snoscan executable #-----Other Algorithms probuild= #location of probuild executable (required for genemark)
If you require augustus
then load it as a module.
Teton Example:
[]$ cat maker_exe.ctl #-----Location of Executables Used by MAKER/EVALUATOR makeblastdb=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/makeblastdb #location of NCBI+ makeblastdb executable blastn=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/blastn #location of NCBI+ blastn executable blastx=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/blastx #location of NCBI+ blastx executable tblastx=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/blast/bin/tblastx #location of NCBI+ tblastx executable formatdb= #location of NCBI formatdb executable blastall= #location of NCBI blastall executable xdformat= #location of WUBLAST xdformat executable blasta= #location of WUBLAST blasta executable RepeatMasker=/pfs/tc1/apps/el7-x86_64/u/opt/maker/bin/../exe/RepeatMasker/RepeatMasker #location of RepeatMasker executable exonerate=/apps/u/gcc/7.3.0/exonerate/2.4.0-3bglywa/bin/exonerate #location of exonerate executable #-----Ab-initio Gene Prediction Algorithms snap= #location of snap executable gmhmme3= #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/apps/u/gcc/7.3.0/augustus/3.3.2-etubcuo/bin/augustus #location of augustus executable fgenesh= #location of fgenesh executable tRNAscan-SE= #location of trnascan executable snoscan= #location of snoscan executable #-----Other Algorithms probuild= #location of probuild executable (required for genemark)
maker_opts.ctl
On Beartooth
We have observed a problem with the model_org=all
option, related to RepeatMasker
, of the form:
running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_4xxbhq; /apps/u/spack/gcc/12.2.0/repeatmasker/4.1.2-p1-vwpy6c4/bin/RepeatMasker /pfs/tc1/project/arcc/software/maker/testing/maker_tutorial/example_01_basic/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/0/contig-dpp-500-500.0.all.rb -species all -dir /pfs/tc1/project/arcc/software/maker/testing/maker_tutorial/example_01_basic/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/0 -pa 1 #-------------------------------# Species "all" is not known to RepeatMasker. There may not be any TE families defined in the libraries for this species/clade or there may be an error in the spelling. Please check your entry against the NCBI Taxonomy database and/or try using a broader clade or related species instead. The full list of species/clades defined in the library may be obtained using the famdb.py script.
We have worked around this issue by setting the option to: model_org
to simple
or leaving it blank.
Under External Application Behaviour Options, you have the following multicore option:
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
Teton Script Templates
What follows are two example scripts, one for a single node/single cpu, and another for multiple nodes using multiple tasks.
Single Node
If you are running a small data set, and only require a single node/core, then the following template provides an example:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 ... module load gcc/7.3.0 maker/2.31.10 augustus/3.3.2-py27 exonerate/2.4.0-py27 maker
Multiple Nodes
For much larger data sets, Maker can be ran across multiple nodes and tasks using something like the following:
#!/bin/bash #SBATCH --nodes=4 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 ... # Modules to Load module load gcc/7.3.0 maker/2.31.10 augustus/3.3.2-py27 exonerate/2.4.0-py27 mpiexec -n 32 maker
Notice that the line to start maker has changed, with the value after the -n
option equaling the number of nodes multiplied by the number of tasks (4 x 8 = 32).
Memory Issues
Memory is always an issue with any form of bioinformatics analysis, and there are no straight forward recommendations we can make. As a researcher you’ll need to track the size of your data sets, the type of analysis, and the resources you’ve requested and how efficiently they’ve been used.
One indicator that you have not allocated enough memory is if you see an error of the following form:
ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:tig00000510
Please refer to our Slurm page on requesting and using memory: Introduction to Job Submission: 02: Memory and GPUs