Sickle is a windowed adaptive trimming tools for fastq files. Beside sliding window, Sickle uses quality and length thresholds to determine and trim low quality bases at both 3’ end and 5’ end of the reads.
Information about the Sickle command-line options can be shown by typing:
$ sickle --help
Sickle is single threaded program.
The basic usage of Sickle for single-end reads is:
$ sickle se -t [solexa|illumina|sanger] -f input_reads.fastq -o output_reads_trimmed.fastq
sickle se
is the -t option that based on the input data, accepts one of the following quality values: solexa, illumina, sanger.
Simple SLURM Sickle script for Illumina single-end reads input file input_reads.fastq
is shown below:
sickle_single.submit
#!/bin/bash
#SBATCH --job-name=Sickle
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Sickle_single.%J.out
#SBATCH --error=Sickle_single.%J.err
module load sickle/1.210
sickle se -t illumina -f input_reads.fastq -o output_reads_trimmed.fastq
The basic usage of Sickle for paired-end reads is:
$ sickle pe -t [solexa|illumina|sanger] -f input_reads_pair_1.fastq -r input_reads_pair_2.fastq -o output_reads_trimmed_pair_1.fastq -p output_reads_trimmed_pair_2.fastq -s output_reads_trimmed_single.fastq
Simple SLURM Sickle script for Sanger paired-end reads input files input_reads_pair_1.fastq
and input_reads_pair_2.fastq
is shown below:
sickle_paired.submit
#!/bin/bash
#SBATCH --job-name=Sickle
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Sickle_paired.%J.out
#SBATCH --error=Sickle_paired.%J.err
module load sickle/1.210
sickle pe -t sanger -f input_reads_pair_1.fastq -r input_reads_pair_2.fastq -o output_reads_trimmed_pair_1.fastq -p output_reads_trimmed_pair_2.fastq -s output_reads_trimmed_single.fastq
Sickle returns fastq file of reads with trimmed low quality bases from both 3’ and 5’ ends. Sickle reduces the sequence length, while the number of sequences in the output file stays the same.
In order to test the Sickle (sickle/1.210) performance, we used three paired-end input fastq files, small_1.fastq
and small_2.fastq
, medium_1.fastq
and medium_2.fastq
, and large_1.fastq
and large_2.fastq
. Some statistics about the input files and the time and memory resources used by Sickle are shown in the table below:
total # of sequences | total # of bases | total size in MB | used time | used memory | # of used CPUs | |
---|---|---|---|---|---|---|
small_1.fastq | 50,121 | 2,506,050 | 8.010 | ~ 0.03 minutes | ~ 0.05 GB | 1 |
small_2.fastq | 50,121 | 2,506,050 | 8.010 | |||
medium_1.fastq | 786,742 | 59,792,392 | 152 | ~ 0.2 minutes | ~ 0.7 GB | 1 |
medium_2.fastq | 786,742 | 59,792,392 | 152 | |||
large_1.fastq | 10,174,715 | 1,027,646,215 | 3,376 | ~ 3 minutes | ~ 13 GB | 1 |
large_2.fastq | 10,174,715 | 1,027,646,215 | 3,376 |