TopHat is a fast splice junction mapper for RNA-Seq data. It first aligns RNA-Seq reads to reference genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Although there is no difference between the available options for both TopHat and TopHat2 and the number of output files, TopHat2 incorporates many significant improvements to TopHat. The TopHat package at HCC supports both tophat and tophat2.
The basic usage of TopHat2 is:
$ [tophat|tophat2] [options] index_prefix [input_reads_pair_1.[fasta|fastq] input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]]
TopHat2 uses single or comma-separated list of paired-end and single-end reads in fasta or fastq format. The single-end reads need to be provided after the paired-end reads.
More advanced TopHat2 options can be found in its manual, or by typing:
$ tophat2 -h
Prior running TopHat/TopHat2, an index from the reference genome should be built using Bowtie/Bowtie2. Moreover, TopHat2 requires both, the index file and the reference file, to be in the same directory. If the reference file is not available,TopHat2 reconstructs it in its initial step using the index file.
An example of how to run TopHat2 on Swan with paired-end fastq files input_reads_pair_1.fastq
and input_reads_pair_2.fastq
, reference index index_prefix
and 8 CPUs
is shown below:
tophat2_alignment.submit
#!/bin/bash
#SBATCH --job-name=Tophat2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Tophat2.%J.out
#SBATCH --error=Tophat2.%J.err
module load samtools/1.3 bowtie/2.3 tophat/2.0
tophat2 -p $SLURM_NTASKS_PER_NODE index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq
TopHat2 generates its own output directory tophat_output/
that contains multiple TopHat2 generated files.
TopHat2 produces number of files in its tophat_out/
output directory. Some of the generated files are: