FPKM, TPM, RSEM or RPKM!? QC metrics for RNA-seq quantification

There are many quantification methods proposed to quantify expression abundance of genes, transcripts, exons or splicing junctions. But we all want to use the best one, right?
Rafael A. Irizarry team at Dana-Farber Cancer Institute assessed seven competing pipelines to evaluate the performance of transcript quantification and help us understand which are the best methods!
How cool is that?

Here is the paper they have published:

Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. (2016) A benchmark for RNA-seq quantification pipelines. Genome Biol 17(1):74.

To demonstrate the utility of their assessment metrics, they used them to compare the Cufflinks, eXpress, Flux Capacitor, kallisto, RSEM, Sailfish, and Salmon quantification methods. Performance was generally poor, with two methods clearly underperforming and RSEM slightly outperforming the rest. A webtool is available that permits users to submit other competing methods.

To compare several competing methods, they have developed a series of statistical summaries and data visualization techniques. And they have made them available as a R/Bioconductor package rnaseqcomp

Here are the QC metrics they have chosen:

  • Specificity on expressed features:  This metric is evaluated by quantifying deviations between technical replicates. Lower deviations indicate higher specificity.
  • Specificity on non-expressed features: This metric summaryses the proportions of non-expressed features. It compares two RNA-seq replicates, and computes the average proportion of transcripts expressed in one replicate but not the other, or not expressed in neither. A lower proportion of such transcripts indicates a better specificity.
  • Consistency of isoform calls: This metric is evaluated by quantifying the proportion of one transcript for genes that only include two annotated transcripts. It computes the difference in proportions between replicates. Basically, lower values indicate better specificity for expression of genes that only have two transcripts.
  • ROC curves and pAUC: This analysis allows us to assess sensitivity and specificity simultaneously, by comparing the results for sets of features that are truely differential expressed. It plots the ROC curves for comparing the true-positive and false-positive differentially expressed features; and it also evaluates the fold-changes of features that are truely differential expressed (higher values are better).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s