Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Use the module name cutadapt
to discover versions available and to load the application.
The cutadapt
command line call provides the -j CORES, --cores CORES
option to allow you to run over multiple cores. Call cutadapt --help
for more information.
We have had a number of researchers using cutadapt within dada2 related R scripts, as detailed under DADA2 ITS Pipeline Workflow (1.8). Following this page, below is one example of how to use cutadapt, within your dada2 related script.
Within your bash script you will need to first load the cutadapt module:
... module load cutadapt/2.10 ... srun Rscript dada2_code.R ... |
Behind the scenes, this will add the path to the cutadapt
binary into your environment variables.
The next step is to make a system call from within your R script that calls cutadapt
.
So, within your dada2_code.R
file (you can rename this to what ever you want).
... # Since you have loaded the cutadpath module, and it's path is in your environment variables # you do not have to state the full path. cutadapt_cmd <- "cutadapt" # Run shell commands from R to print the version of cutadapt being called. system2(cutadapt_cmd, args = "--version") ... # Run Cutadapt # Change options, variable names appropriately to match your code and call requirements. # Notice this example also uses the -j option. # The number of cores defined MUST match the the number of cpus-per-task requested. for(i in seq_along(fnFs)) { system2(cutadapt_cmd, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], # Output files. "-j", 16, # Number of CPU cores to use. fnFs.filtN[i], fnRs.filtN[i])) # Input files. } |