PRINSEQ

PRINSEQ (PReprocessing and INformation of SEQuence data) is a tool used for filtering, formatting or trimming genome and metagenomic sequence data in fasta/fastq format. Moreover, PRINSEQ generates summary statistics of sequence and quality data.

More information about the PRINSEQ program can be shown with:

$ prinseq-lite.pl --help

PRINSEQ with single-end fasta data

The basic usage of PRINSEQ for single-end data is:

$ prinseq-lite.pl [-fasta|-fastq] input_reads.[fasta|fastq] -out_format [1|2|3|4|5] [options]
where input_reads.[fasta|fastq] is an input file of sequence data in fasta/fastq format, and options are additional parameters that can be found in the PRINSEQ manual.

The output format (-out_format) can be 1 (fasta only), 2 (fasta and qual), 3 (fastq), 4 (fastq and input fasta), and 5 (fastq, fasta and qual).

Simple PRINSEQ SLURM script for single-end fasta data and fasta output format is shown below:

prinseq_single_end.submit
#!/bin/bash
#SBATCH --job-name=PRINSEQ
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=PRINSEQ_single.%J.out
#SBATCH --error=PRINSEQ_single.%J.err

module load prinseq-lite/0.20

prinseq-lite.pl -fasta input_reads.fasta -out_format 1

PRINSEQ is single threaded program, and therefore both #SBATCH --nodes and #SBATCH --ntasks-per-node are set to 1.

PRINSEQ for paired-end fastq data

The basic usage of PRINSEQ for paired-end data is:

$ prinseq-lite.pl [-fasta|-fastq] input_reads_pair_1.[fasta|fastq] [-fasta2|-fastq2] input_reads_pair_2.[fasta|fastq] -out_format [1|2|3|4|5] [options]
where input_reads_pair_1.[fasta|fastq] and input_reads_pair_2.[fasta|fastq] are pair 1 and pair 2 of the input files of sequence data in fasta/fastq format, and options are additional parameters that can be found in the the PRINSEQ manual.

The output format (-out_format) can be 1 (fasta only), 2 (fasta and qual), 3 (fastq), 4 (fastq and input fasta), and 5 (fastq, fasta and qual).

Simple PRINSEQ SLURM script for paired-end fastq data and fastq output format is shown below:

prinseq_paired_end.submit
#!/bin/bash
#SBATCH --job-name=PRINSEQ
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=PRINSEQ_paired.%J.out
#SBATCH --error=PRINSEQ_paired.%J.err

module load prinseq-lite/0.20

prinseq-lite.pl -fastq input_reads_pair_1.fastq -fastq2 input_reads_pair_2.fastq -out_format 3

PRINSEQ is single threaded program, and therefore both #SBATCH --nodes and #SBATCH --ntasks-per-node are set to 1.

PRINSEQ Output

PRINSEQ gives statistics about the input and filtered sequences, and also outputs files of single-end or paired-end sequences filtered by specified parameters.