PRINSEQ (PReprocessing and INformation of SEQuence data) is a tool used for filtering, formatting or trimming genome and metagenomic sequence data in fasta/fastq format. Moreover, PRINSEQ generates summary statistics of sequence and quality data.
More information about the PRINSEQ program can be shown with:
$ prinseq-lite.pl --help
The basic usage of PRINSEQ for single-end data is:
$ prinseq-lite.pl [-fasta|-fastq] input_reads.[fasta|fastq] -out_format [1|2|3|4|5] [options]
The output format (-out_format
) can be 1 (fasta only), 2 (fasta and qual), 3 (fastq), 4 (fastq and input fasta), and 5 (fastq, fasta and qual).
Simple PRINSEQ SLURM script for single-end fasta data and fasta output format is shown below:
prinseq_single_end.submit
#!/bin/bash
#SBATCH --job-name=PRINSEQ
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=PRINSEQ_single.%J.out
#SBATCH --error=PRINSEQ_single.%J.err
module load prinseq-lite/0.20
prinseq-lite.pl -fasta input_reads.fasta -out_format 1
PRINSEQ is single threaded program, and therefore both #SBATCH --nodes
and #SBATCH --ntasks-per-node
are set to 1.
The basic usage of PRINSEQ for paired-end data is:
$ prinseq-lite.pl [-fasta|-fastq] input_reads_pair_1.[fasta|fastq] [-fasta2|-fastq2] input_reads_pair_2.[fasta|fastq] -out_format [1|2|3|4|5] [options]
The output format (-out_format
) can be 1 (fasta only), 2 (fasta and qual), 3 (fastq), 4 (fastq and input fasta), and 5 (fastq, fasta and qual).
Simple PRINSEQ SLURM script for paired-end fastq data and fastq output format is shown below:
prinseq_paired_end.submit
#!/bin/bash
#SBATCH --job-name=PRINSEQ
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=PRINSEQ_paired.%J.out
#SBATCH --error=PRINSEQ_paired.%J.err
module load prinseq-lite/0.20
prinseq-lite.pl -fastq input_reads_pair_1.fastq -fastq2 input_reads_pair_2.fastq -out_format 3
PRINSEQ is single threaded program, and therefore both #SBATCH --nodes
and #SBATCH --ntasks-per-node
are set to 1.
PRINSEQ gives statistics about the input and filtered sequences, and also outputs files of single-end or paired-end sequences filtered by specified parameters.