GeneMark-ES Suite

Overview

GeneMark: A family of gene prediction programs developed at Georgia Institute of Technology , Atlanta, Georgia, USA.

  • GeneMark-ES: Unsupervised training is an important feature of the GeneMark-ES algorithm that identifies protein coding genes in eukaryotic genomes. This is the only eukaryotic gene finder that can perform gene prediction without curated training sets.

Using

Use the module name genemark to discover versions available and to load the application.

Setup License:

Note: To use this software, you will need to copy a license file into your home folder and rename it. These files do expire. ARCC does not monitor for this, so if you come across this please inform use, and we will look at updating it.

Beartooth # v4.71 []$ cp /apps/u/opt/gcc/12.2.0/genemark/4.71/gm_key_64 ~/.gm_key # Teton: []$ cp /apps/u/opt/genemark/gm_key_64 ~/.gm_key

Below is an example of the output and error you’ll receive when trying to use the software if the license file wasn’t installed correctly:

[]$ gmes_petap.pl --seq input/genome.fasta --EP --dbep input/proteins.fasta --verbose --cores=8 --max_intergenic 10000 --mask_penalty 0 # check before the run ... License key ".gm_key" not found. This file is neccessary in order to use GeneMark.hmm eukaryotic 3. ...

Multicore

Please look at the various tests that provide examples on using multiple cores, and the use of the --cores option. The value passed will need to match with the --cpus-per-task value you define within your salloc commands/sbatch scripts.

Test Code:

Can be copied from the system’s installation folder into your account using: cp -R /apps/u/opt/genemark/gmes_linux_64/GeneMark-E-tests/ .

Please read the README.md files in the folders to understand how to run the various tests.

Beartooth # v4.71 []$ cp -R /apps/u/opt/gcc/12.2.0/genemark/4.71/GeneMark-E-tests . # Teton: []$ cp -R /apps/u/opt/genemark/gmes_linux_64/GeneMark-E-tests/ .

How to Run:

If you personally download and unpack the source, then you should read: