Cutadapt is a tool for removing adapter sequences from DNA sequencing data. Although most of the adapters are located at the 3’ end of the sequencing read, Cutadapt allows multiple adapter removal from both 3’ and 5’ ends.
The basic usage of Cutadapt is:
$ cutadapt [-a|-b|-g] <adapter_sequence> input_reads.[fasta|fastq] > output_reads.[fasta|fastq]
The option -a allows removal of adapters from the 3’ end of the sequencing read. The option -b removes adapters ligated to the 5’ or 3’ end. The option -g removes adapter sequences from the 5’ end. These options can be used multiple times for different adapters.
More information about the Cutadapt options can be found by typing:
$ cutadapt --help
Simple Cutadapt script that trims the adapter sequences AGGCACACAGGG and TGAGACACGCA from the 3’ end and AACCGGTT from the 5’ end of single-end fasta input file is shown below:
cutadapt.submit
#!/bin/bash
#SBATCH --job-name=Cutadapt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Cutadapt.%J.out
#SBATCH --error=Cutadapt.%J.err
module load cutadapt/1.13
cutadapt -a AGGCACACAGGG -a TGAGACACGCA -g AACCGGTT input_reads.fasta > output_reads.fasta
Cutadapt is single threaded program, and therefore #SBATCH --nodes=1
and #SBATCH --ntasks-per-node=1
.
Cutadapt allows paired-end trimming where each pair is trimmed separately in a single pass:
$ cutadapt -a ADAPTER_PAIR_1 input_reads_pair_1.fastq > output_reads_pair_1.fastq
$ cutadapt -a ADAPTER_PAIR_2 input_reads_pair_2.fastq > output_reads_pair_2.fastq
Beside the fasta/fastq file of reads with removed adapter sequences, Cutadapt also outputs useful statistics per adapter sequence.