Cufflinks

Cufflinks is a transcript assembly program that includes a number of tools for analyzing RNA-Seq data. These tools assemble aligned RNA-Seq reads into transcripts, estimate their abundances, test for differential expression and regulation, and provide transcript quantification. Some of the tools part of Cufflinks can be run individually, while others are part of a larger workflow.

The basic usage of Cufflinks is:

$ cufflinks [options] input_alignments.[sam|bam]
where input_alignments.[sam|bam] is sorted input file of RNA-Seq read alignments in SAM/BAM format. The RNA-Seq read mapper TopHat/TopHat2 produces output in this format and is recommended to be used with Cufflinks, although SAM/BAM alignments produced from any aligner are accepted. 

More advanced Cufflinks options can be found in the manual or by typing:

$ cufflinks -h

An example of how to run Cufflinks on Crane with alignment file in SAM format, output directory cufflinks_output and 8 CPUs is shown below:

cufflinks.submit
#!/bin/sh
#SBATCH --job-name=Cufflinks
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Cufflinks.%J.out
#SBATCH --error=Cufflinks.%J.err

module load cufflinks/2.2

cufflinks input_alignments.sam -o cufflinks_output/ -p ${SLURM_NTASKS_PER_NODE}

The program cufflinks produces number of files in its predefined output directory cufflinks_output/. Some of the generated files are:

  • transcripts.gtf: The GTF file contains Cufflinks’ assembled isoforms where there is one GTF record per row, and each record represents either a transcript or an exon within a transcript
  • isoforms.fpkm_tracking: This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format
  • genes.fpkm_tracking: This file contains the estimated gene-level expression values in the generic FPKM Tracking Format

Available commands

Beside cufflinks, the Cufflinks package includes the following programs:

  • Cuffcompare

cuffcompare uses the Cufflinks’ GTF output as an input file and compares the assembled transcripts to a reference annotation.

Example of comparing the already annotated genome known_annotation.gtf with the new annotation new_annotation.gtf:

$ cuffcompare -r known_annotation.gtf new_annotation.gtf

This tool reports various statistics about the transcripts, as well as a GTF file containing all transfrags in each sample.

  • Cuffmerge

This program allows merging of multiple Cufflinks GTF files.

Example of merging multiple GTF files with full paths defined in the file list_GTF.txt and 8 CPUs is:

$ cuffmerge list_GTF.txt -p 8

The output of cuffmerge is single unified transcript file.

  • Cuffdiff

cuffdiff is used to identify differentially expressed transcripts.

Example of cuffdiff for the annotated transcripts for the new genome, new_annotations.gtf, with 3 SAM alignment files generated from TopHat and 8 CPUs:

$ cuffdiff new_alignments.gtf sample_1.sam, sample_2.sam, sample_3.sam -p 8

cuffdiff prints multiple output files, such as FPKM tracking files, count tracking files, read group tracking files, differential expression tests, differential splicing tests, differential coding output, differential promoter use, read group info, and run info.