Sickle

Sickle is a windowed adaptive trimming tools for fastq files. Beside sliding window, Sickle uses quality and length thresholds to determine and trim low quality bases at both 3’ end and 5’ end of the reads.

Information about the Sickle command-line options can be shown by typing:

$ sickle --help

Sickle is single threaded program.

Sickle for single-end reads

The basic usage of Sickle for single-end reads is:

$ sickle se -t [solexa|illumina|sanger] -f input_reads.fastq -o output_reads_trimmed.fastq
where input_reads.fastq is the input file of sequencing data in fastq format, and output_reads_trimmed.fastq is the trimmed output file. Another required option in sickle se is the -t option that based on the input data, accepts one of the following quality values: solexailluminasanger.

Simple SLURM Sickle script for Illumina single-end reads input file input_reads.fastq is shown below:

sickle_single.submit
#!/bin/sh
#SBATCH --job-name=Sickle
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Sickle_single.%J.out
#SBATCH --error=Sickle_single.%J.err

module load sickle/1.210

sickle se -t illumina -f input_reads.fastq -o output_reads_trimmed.fastq

Sickle for paired-end reads

The basic usage of Sickle for paired-end reads is:

$ sickle pe -t [solexa|illumina|sanger] -f input_reads_pair_1.fastq -r input_reads_pair_2.fastq -o output_reads_trimmed_pair_1.fastq -p output_reads_trimmed_pair_2.fastq -s output_reads_trimmed_single.fastq
where input_reads_pair_1.fastq and input_reads_pair_2.fastq are the input fastq files of the sequencing data, and respectively, output_reads_trimmed_pair_1.fastq and output_reads_trimmed_pair_2.fastq are the trimmed output files. sickle pe also prints output_reads_trimmed_single.fastq file that contains reads that passed the filter in one pair, but not in the other read pair. Sickle supports three types of quality values: solexa, illumina, sanger, and this type must be specified using the -t option.

Simple SLURM Sickle script for Sanger paired-end reads input files input_reads_pair_1.fastq and input_reads_pair_2.fastq is shown below:

sickle_paired.submit
#!/bin/sh
#SBATCH --job-name=Sickle
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Sickle_paired.%J.out
#SBATCH --error=Sickle_paired.%J.err

module load sickle/1.210

sickle pe -t sanger -f input_reads_pair_1.fastq -r input_reads_pair_2.fastq -o output_reads_trimmed_pair_1.fastq -p output_reads_trimmed_pair_2.fastq -s output_reads_trimmed_single.fastq

Sickle Output

Sickle returns fastq file of reads with trimmed low quality bases from both 3’ and 5’ ends. Sickle reduces the sequence length, while the number of sequences in the output file stays the same.

Useful Information

In order to test the Sickle (sickle/1.210) performance on Tusker, we used three paired-end input fastq files, small_1.fastq and small_2.fastq, medium_1.fastq and medium_2.fastq, and large_1.fastq and large_2.fastq. Some statistics about the input files and the time and memory resources used by Sickle on Tusker are shown in the table below:

total # of sequences total # of bases total size in MB used time used memory # of used CPUs
small_1.fastq 50,121 2,506,050 8.010 ~ 0.03 minutes ~ 0.05 GB 1
small_2.fastq 50,121 2,506,050 8.010
medium_1.fastq 786,742 59,792,392 152 ~ 0.2 minutes ~ 0.7 GB 1
medium_2.fastq 786,742 59,792,392 152
large_1.fastq 10,174,715 1,027,646,215 3,376 ~ 3 minutes ~ 13 GB 1
large_2.fastq 10,174,715 1,027,646,215 3,376