Ensembl-VEP

Overview

Ensembl Variant Effect Predictor (VEP) is a powerful toolset for the analysis, annotation, and prioritization of genomic variants, including in non-coding regions. The VEP accurately predicts the effects of sequence variants on transcripts, protein products, regulatory regions, and binding motifs by leveraging the high quality, broad scope, and integrated nature of the Ensembl databases. In addition, it enables comparison with a large collection of existing publicly available variation data within Ensembl to provide insights into the population and ancestral genetics, phenotypes and disease.

Ensembl VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs, or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Simply input the coordinates of your variants and the nucleotide changes to find out the:

  • Genes and Transcripts affected by the variants

  • Location of the variants (e.g. upstream of a transcript, in the coding sequence, in non-coding RNA, in regulatory regions)

  • Consequence of your variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift)

  • Known variants that match yours, and associated minor allele frequencies from the 1000 Genomes Project

  • SIFT and PolyPhen-2 scores for changes to protein sequence

Using

Use the module name ensembl-vep to discover versions available and to load the application.

This application has been installed with:

  • No cache files.

  • No FASTA files.

  • No plugins.

This is because there are 100s of cache and FASTA files and 10s of plugins. Since these are being regularly updated and new plugins released, it is too time-consuming to track and manage. These will need to be installed locally.

When vep starts up it looks for the tabix command. This is part of the htslib installation, and this module is automatically loaded alongside vep.

ARCC has tried testing the halpo (also see here) command, and although the command runs, we’re unable to actually generate any results due to not having a suitable data set. It does appear that a specific data set is required “input data must be a VCF containing phased genotype data for at least one individual and file must be sorted by chromosome and genomic position; no other formats are currently supported.

Multicore:

The vep command can run across multicores, view the vep command for more details about the --fork option.