Overview
ANGSD: is a software for analyzing next generation sequencing data. The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data. The software is written in C++ and has been used on large sample sizes. This program is not for manipulating
BAM/CRAM files, but solely a tool to perform various kinds of analysis. We recommend the excellent program SAMtools for outputting and modifying bamfile
Using
Use the module name angsd
to discover versions available and to load the application.
Example
[]$ angsd --help -> angsd version: 0.935 (htslib: 1.12) build(Jun 15 2021 09:02:30) -> angsd --help -> angsd version: 0.935 (htslib: 1.12) build(Jun 15 2021 09:02:31) -> Please use the website "http://www.popgen.dk/angsd" as reference -> Use -nThreads or -P for number of threads allocated to the program Overview of methods: -GL Estimate genotype likelihoods -doCounts Calculate various counts statistics -doAsso Perform association study -doMaf Estimate allele frequencies -doError Estimate the type specific error rates -doAncError Estimate the errorrate based on perfect fastas -HWE_pval Est inbreedning per site or use as filter -doGeno Call genotypes -doFasta Generate a fasta for a BAM file -doAbbababa Perform an ABBA-BABA test -sites Analyse specific sites (can force major/minor) -doSaf Estimate the SFS and/or neutrality tests genotype calling -doHetPlas Estimate hetplasmy by calculating a pooled haploid frequency Below are options that can be usefull -bam Options relating to bam reading -doMajorMinor Infer the major/minor using different approaches -ref/-anc Read reference or ancestral genome -doSNPstat Calculate various SNPstat -cigstat Printout CIGAR stat across readlength many others Output files: In general the specific analysis outputs specific files, but we support basic bcf output -doBcf Wrapper around -dopost -domajorminor -dofreq -gl -dovcf docounts For information of specific options type: ./angsd METHODNAME eg ./angsd -GL ./angsd -doMaf ./angsd -doAsso etc ./angsd sites for information about indexing -sites files Examples: Estimate MAF for bam files in 'list' './angsd -bam list -GL 2 -doMaf 2 -out RES -doMajorMinor 1'
Using Multiple Cores/Threads
The angsd application allows you define multiple cores/threads within calls using the '-nThreads or -P for number of threads allocated to the program
' options.
Following the examples, a batch script would take the following form where we are defining and using 8 cores (on a single node).
#SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=8 ... module load gcc/7.3.0 module load angsd/0.935 ... angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 8
The best number of cores/threads to use for a particular job will depend on the type of sequencing/data and is up to the researcher to explore.
Please feel free to report/share any interesting observations and we can update this page.