TagCleaner

TagCleaner is a tool used to automatically detect and remove tag sequences from genomic and metagenomic sequence data. These additional tag sequences can contain deletions or insertions due to sequencing limitations.

The basic usage of TagCleaner is:

$ tagcleaner.pl [-fasta|-fastq] input_reads.[fasta|fastq] [-predict|-tag3|-tag5] [options]

where input_reads.[fasta|fastq] is an input file of sequence data in fasta/fastq format, and options are additional parameters that can be found in the TagCleaner manual.

Required parameter for TagCleaner is the tag sequence. If the tag sequence is unknown, then the -predict option will provide the predicted tag sequence to the user. If the tag sequence is known and is found at the 3' end of the read, then the option -tag3 <tag_sequence> is used. If the tag sequence is known and is found at the 5' end of the read, the the option -tag5 <tag_sequence> is used.

More information about the TagCleaner options can be found by using:

$ tagcleaner.pl --help

Simple TagCleaner script for removing known 3' and 5' tag sequences (NNNCCAAACACACCCAACACA and TGTGTTGGGTGTGTTTGGNNN respectively) is shown below:

#!/bin/bash
#SBATCH --job-name=TagCleaner
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=TagCleaner.%J.out
#SBATCH --error=TagCleaner.%J.err

module load tagcleaner/0.16

tagcleaner.pl -fasta input_reads.fasta -tag3 NNNCCAAACACACCCAACACA -tag5 TGTGTTGGGTGTGTTTGGNNN

TagCleaner is single threaded program, and therefore both #SBATCH --nodes and #SBATCH --ntasks-per-node are set to 1.

TagCleaner Output¶

TagCleaner returns fasta or fastq file of reads with removed tag sequences.