Cufflinks is a transcript assembly program that includes a number of tools for analyzing RNA-Seq data. These tools assemble aligned RNA-Seq reads into transcripts, estimate their abundances, test for differential expression and regulation, and provide transcript quantification. Some of the tools part of Cufflinks can be run individually, while others are part of a larger workflow.
The basic usage of Cufflinks is:
$ cufflinks [options] input_alignments.[sam|bam]
input_alignments.[sam|bam]
is sorted input file of RNA-Seq read alignments in SAM/BAM format. The RNA-Seq read mapper TopHat/TopHat2 produces output in this format and is recommended to be used with Cufflinks, although SAM/BAM alignments produced from any aligner are accepted.
More advanced Cufflinks options can be found in the manual or by typing:
$ cufflinks -h
An example of how to run Cufflinks on Swan with alignment file in SAM format, output directory cufflinks_output
and 8 CPUs is shown below:
cufflinks.submit
#!/bin/bash
#SBATCH --job-name=Cufflinks
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Cufflinks.%J.out
#SBATCH --error=Cufflinks.%J.err
module load cufflinks/2.2
cufflinks input_alignments.sam -o cufflinks_output/ -p ${SLURM_NTASKS_PER_NODE}
The program cufflinks produces number of files in its predefined output directory cufflinks_output/
. Some of the generated files are:
Beside cufflinks, the Cufflinks package includes the following programs:
cuffcompare uses the Cufflinks’ GTF output as an input file and compares the assembled transcripts to a reference annotation.
Example of comparing the already annotated genome known_annotation.gtf
with the new annotation new_annotation.gtf
:
$ cuffcompare -r known_annotation.gtf new_annotation.gtf
This tool reports various statistics about the transcripts, as well as a GTF file containing all transfrags in each sample.
This program allows merging of multiple Cufflinks GTF files.
Example of merging multiple GTF files with full paths defined in the file list_GTF.txt
and 8 CPUs is:
$ cuffmerge list_GTF.txt -p 8
The output of cuffmerge is single unified transcript file.
cuffdiff is used to identify differentially expressed transcripts.
Example of cuffdiff for the annotated transcripts for the new genome, new_annotations.gtf
, with 3 SAM alignment files generated from TopHat and 8 CPUs:
$ cuffdiff new_alignments.gtf sample_1.sam, sample_2.sam, sample_3.sam -p 8
cuffdiff prints multiple output files, such as FPKM tracking files
, count tracking files
, read group tracking files
, differential expression tests
, differential splicing tests
, differential coding output
, differential promoter use
, read group info
, and run info
.