Scythe is a 3’ end adapter trimmer that uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. 3’ ends often include poor quality bases which need to be removed prior the quality-based trimming, mapping, assemblies, and further analysis.
The basic usage of Scythe is:
$ scythe -a adapter_file.fasta input_reads.fastq -o output_reads.fastq
The file output_reads.fastq contains the sequencing reads with removed adapters. If the adapter sequences are unknown, Scythe by itself provides Illumina adapter sequences that can be used with the -a option: illumina_adapters.fa.
More information about Scythe can found by typing:
$ scythe --help
Simple Scythe script that uses the illumina_adapters.fa
file and input_reads.fastq
is shown below:
scythe.submit
#!/bin/bash
#SBATCH --job-name=Scythe
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Scythe.%J.out
#SBATCH --error=Scythe.%J.err
module load scythe/0.991
scythe -a ${SCYTHE_ADAPTERS}/illumina_adapters.fa input_reads.fastq -o output_reads.fastq
Scythe is single threaded program, and therefore both #SBATCH --nodes
and #SBATCH --ntasks-per-node
are set to 1.
The Illumina adapter sequences provided by Scythe are stored in $SCYTHE_ADAPTERS. Hence, to access the illumina adapter file use: $SCYTHE_ADAPTERS/illumina_adapters.fa
.
Scythe returns fastq file of reads with removed adapter sequences.
In order to test the Scythe (scythe/0.991) performance , we used three paired-end input fastq files, small_1.fastq
and small_2.fastq
, medium_1.fastq
and medium_2.fastq
, and large_1.fastq
and large_2.fastq
. Some statistics about the input files and the time and memory resources used by Scythe are shown in the table below:
total # of sequences | total # of bases | total size in MB | used time | used memory | # of used CPUs | |
---|---|---|---|---|---|---|
small_1.fastq | 50,121 | 2,506,050 | 8.010 | ~ 0.04 minutes | ~ 0.014 GB | 1 |
small_2.fastq | 50,121 | 2,506,050 | 8.010 | ~ 0.04 minutes | ~ 0.014 GB | 1 |
medium_1.fastq | 786,742 | 59,792,392 | 152 | ~ 1 minute | ~ 0.2 GB | 1 |
medium_2.fastq | 786,742 | 59,792,392 | 152 | ~ 1 minute | ~ 0.2 GB | 1 |
large_1.fastq | 10,174,715 | 1,027,646,215 | 3,376 | ~ 13 minutes | ~ 3 GB | 1 |
large_2.fastq | 10,174,715 | 1,027,646,215 | 3,376 | ~ 17 minutes | ~ 6.5 GB | 1 |