Cutadapt

Cutadapt is a tool for removing adapter sequences from DNA sequencing data. Although most of the adapters are located at the 3’ end of the sequencing read, Cutadapt allows multiple adapter removal from both 3’ and 5’ ends.

The basic usage of Cutadapt is:

$ cutadapt [-a|-b|-g] <adapter_sequence> input_reads.[fasta|fastq] > output_reads.[fasta|fastq]
where <adapter_sequence> is the nucleotide sequence of the actual adapter, input_reads.[fasta|fastq] is the input file with sequencing data in fasta/fastq format, and respectively, output_reads.[fasta|fastq] is the final trimmed file in fasta/fastq format.

The option -a allows removal of adapters from the 3’ end of the sequencing read. The option -b removes adapters ligated to the 5’ or 3’ end. The option -g removes adapter sequences from the 5’ end. These options can be used multiple times for different adapters.

More information about the Cutadapt options can be found by typing:

$ cutadapt --help

Simple Cutadapt script that trims the adapter sequences AGGCACACAGGG and TGAGACACGCA from the 3’ end and AACCGGTT from the 5’ end of single-end fasta input file is shown below:

cutadapt.submit
#!/bin/sh
#SBATCH --job-name=Cutadapt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Cutadapt.%J.out
#SBATCH --error=Cutadapt.%J.err

module load cutadapt/1.13

cutadapt -a AGGCACACAGGG -a TGAGACACGCA -g AACCGGTT input_reads.fasta > output_reads.fasta

Cutadapt is single threaded program, and therefore #SBATCH --nodes=1 and #SBATCH --ntasks-per-node=1.

Cutadapt allows paired-end trimming where each pair is trimmed separately in a single pass:

$ cutadapt -a ADAPTER_PAIR_1 input_reads_pair_1.fastq > output_reads_pair_1.fastq
$ cutadapt -a ADAPTER_PAIR_2 input_reads_pair_2.fastq > output_reads_pair_2.fastq

Cutadapt Output

Beside the fasta/fastq file of reads with removed adapter sequences, Cutadapt also outputs useful statistics per adapter sequence.