Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Application / Package

Version

Notes:

blobtools

1.1.1

A modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets.

blobtools -h

bedtools

2.30.0

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

bedtools --help

canu

1.4

A fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).

canu

filtlong

0.2.1

A tool for filtering long reads by quality. It can take a set of long reads and produce a smaller, better subset. It uses both read length (longer is better) and read identity (higher is better) when choosing which reads pass the filter.

filtlong -h

LINKS

2.0.1

A genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS.

LINKS

masurca

3.3.0

The MaSuRCA (Maryland Super Read Cabog Assembler) genome assembly and analysis toolkit contains of MaSuRCA genome assembler, QuORUM error corrector for Illumina data, POLCA genome polishing software, Chromosome scaffolder, jellyfish mer counter, and MUMmer aligner.

masurca -h

medaka

1.6.1

A tool to create consensus sequences and variant calls from nanopore sequencing data.

medaka_consensus -h

Notes:

1: Due to use of TensorFlow requires a Physical Node. Use partition=wildiris-phys

If you do try running it on one of the virtual nodes, you will see the following:

Code Block
[salexan5@wi001 ~]$ medaka_consensus
The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
/apps/u/opt/gentools/1.0.0/bin/medaka_consensus: line 16: 17197 Aborted                 (core dumped) medaka tools list_models
...

2: Although medaka (via Tensorflow) can use GPUs, the WildIris cluster does not have any GPUs. When running you will see the following which can be ignored:

Code Block
[salexan5@wi005 ~]$ medaka_consensus
2022-07-06 09:18:37.339581: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /apps/s/slurm/21.08/lib64:/apps/s/slurm/21.08/lib
2022-07-06 09:18:37.339601: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
medaka 1.6.1
------------
Assembly polishing via neural networks. Medaka is optimized
to work with the Flye assembler.
...

3: There is a note that details samtools/bgzip/tabix version 1.14 and minimap2 version 2.17 are recommended as these are those used in development of medaka, you’ll also need bcftools for various commands.

  • versions 1.15 of samtools, tabix, and bgzip are available via the samtools module.

  • versions 1.15 of bcftools is available via the bcftools module.

  • version 2.17 of minimap2 is available within this collection

For example, if you try running some like the following without the modules loaded you’ll see:

Code Block
[salexan5@wi005 ~]$ medaka_haploid_variant -i <fastx_file> -r <fasta_file>
...
Checking program versions
This is medaka 1.6.1
[main] unrecognized command '--version'
Program    Version    Required   Pass
bcftools   Not found  1.11       False
bgzip      Not found  1.11       False
minimap2   2.17       2.11       True
samtools   Not found  1.11       False
tabix      Not found  1.11       False

Once loaded, you’ll see:

Code Block
[salexan5@wi005 ~]$ module load samtools/1.15 bcftools/1.15
[salexan5@wi005 ~]$ medaka_haploid_variant -i <fastx_file> -r <fasta_file>
...
Checking program versions
This is medaka 1.6.1
Program    Version    Required   Pass
bcftools   1.15       1.11       True
bgzip      1.15       1.11       True
minimap2   2.17       2.11       True
samtools   1.15       1.11       True
tabix      1.15       1.11       True

miniasm

0.3-r179

A very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format.

miniasm -h

minimap2

2.17

A versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database.

minimap2 -h

NanoPlot

1.40.0

Plotting tool for long read sequencing data and alignments.

NanoPlot -h

pilon

1.24

A software tool which can be used to:

  • Automatically improve draft assemblies

  • Find variation among strains, including large event detection

pilon --help

Note: According to the pilon requirements documentation, the tool requires a minimum of 8G to run. To accommodate this, when submitting a job using sbatch or creating an interactive session with salloc, please use --mem=8G.

If your data requires more than 8G, then you’ll need to use an alternative command-line. The example below demonstrates using --mem=16G:

Code Block
java -Xmx16G -jar /apps/u/opt/gentools/1.0.0/share/pilon-1.24-0/pilon.jar ...

Note how the --mem=16G matches -Xmx16G.

porechop

0.2.4

A a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity.

porechop -h

racon

1.4.20

Is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step.

racon -h

tabview

1.4.3

A curses command-line CSV and list (tabular data) viewer.

tabview -h

...