TagCleaner is a tool used to automatically detect and remove tag sequences from genomic and metagenomic sequence data. These additional tag sequences can contain deletions or insertions due to sequencing limitations.
The basic usage of TagCleaner is:
$ tagcleaner.pl [-fasta|-fastq] input_reads.[fasta|fastq] [-predict|-tag3|-tag5] [options]
Required parameter for TagCleaner is the tag sequence. If the tag sequence is unknown, then the -predict option will provide the predicted tag sequence to the user. If the tag sequence is known and is found at the 3’ end of the read, then the option -tag3 <tag_sequence> is used. If the tag sequence is known and is found at the 5’ end of the read, the the option -tag5 <tag_sequence> is used.
More information about the TagCleaner options can be found by using:
$ tagcleaner.pl --help
Simple TagCleaner script for removing known 3’ and 5’ tag sequences (NNNCCAAACACACCCAACACA
and TGTGTTGGGTGTGTTTGGNNN
respectively) is shown below:
tagcleaner.submit
#!/bin/bash
#SBATCH --job-name=TagCleaner
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=TagCleaner.%J.out
#SBATCH --error=TagCleaner.%J.err
module load tagcleaner/0.16
tagcleaner.pl -fasta input_reads.fasta -tag3 NNNCCAAACACACCCAACACA -tag5 TGTGTTGGGTGTGTTTGGNNN
TagCleaner is single threaded program, and therefore both #SBATCH --nodes
and #SBATCH --ntasks-per-node
are set to 1.
TagCleaner returns fasta or fastq file of reads with removed tag sequences.