Getting Genetics Done: Differential Isoform Expression With RNA-Seq: Are We Really There Yet?

Monday, December 17, 2012

Differential Isoform Expression With RNA-Seq: Are We Really There Yet?

In case you missed it, a new paper was published in Nature Biotechnology on a method for detecting isoform-level differential expression with RNA-seq Data:

Trapnell, Cole, et al. "Differential analysis of gene regulation at transcript resolution with RNA-seq." Nature Biotechnology (2012).

THE PROBLEM

RNA-seq enables transcript-level resolution of gene expression, but there is no proven methodology for simultaneously accounting for biological variability across replicates and uncertainty in mapping fragments to isoforms. One of the most commonly used workflows is to map reads with a tool like Tophat or STAR, use a tool like HTSeq to count the number of reads overlapping a gene, then use a negative-binomial count-based approach such as edgeR or DESeq to assess differential expression at the gene level.

Figure 1 in the paper illustrates the problem with existing approaches, which only count the number of fragments originating from either the entire gene or constitutive exons only.

Excerpt from figure 1 from the Cuffdiff 2 paper.

In the top row, a change in gene expression is undetectable by counting reads mapping to any exon, and is underestimated if counting only constitutive exons. In the middle row, an apparent change would be detected, but in the wrong direction if using a count-based method alone rather than accounting for which transcript a read comes from and how long that transcript is. How often situations like the middle row happen in reality, that's anyone's guess.

THE PROPOSED SOLUTION

The method presented in this paper, popularized by the cuffdiff method in the Cufflinks software package, claims to address both of these problems simultaneously by modeling variability in the number of fragments generated by each transcript across biological replicates using a beta negative binomial mixture distribution that accounts for both sources of variability in a transcript's measured expression level. This so-called transcript deconvolution is not computationally trivial, and incredibly difficult to explain, but failure to account for the uncertainty (measurement error) from which transcript a fragment originates from can result in a high false-positive rate, especially when there is significant differential regulation of isoforms. Compared to existing methods, the procedure described claims equivalent sensitivity with a much lower false-positive rate when there is substantial isoform-level variability in gene expression between conditions.

ALTERNATIVE WORKFLOWS

Importantly, the manuscript also addresses and points out weaknesses several undocumented "alternative" workflows that are discussed often on forums like SEQanswers and anecdotally at meetings. These alternative workflows are variations on a theme: combining transcript-level fragment count estimates (like estimates from Cufflinks, eXpress, or RSEM mapping to a transcriptome), with downstream count-based analysis tools like edgeR/DESeq (both R/Bioconductor packages). This paper points out that none of these tools were meant to be used this way, and doing so violates assumptions of underlying statistics used by both procedures. However, the authors concede that the variance modeling strategies of edgeR and DESeq are robust, and thus assessed the performance of these "alternative" workflows. The results of those experiments show that the algorithm presented in this paper, cuffdiff 2, outperforms other alternative hybrid Cufflinks/RSEM + edgeR/DESeq workflows [see supplementary figure 77 (yes, 77!]).

REPRODUCIBILITY ISSUES

In theory (and in the simulation studies presented here, see further comments below), the methodology presented here seems to outperform any other competing workflow. So why isn't everyone using it, and why is there so much grumbling about it on forums and at meetings? For many (myself included), the biggest issue is one of reproducibility. There are many discussions about cufflinks/cuffdiff providing drastically different results from one version to the next (see here, here, here, here, and here, for a start). The core I run operates in a production environment where everything I do must be absolutely transparent and reproducible. Reporting drastically different results to my collaborators whenever I update the tools I'm using is very alarming to a biologist, and reduces their confidence in the service I provide and the tools I use.

Furthermore, a recent methods paper recently compared their tool, DEXSeq, to several different versions of cuffdiff. Here, the authors performed two comparisons: a "proper" comparison, where replicates of treatments (T1-T3) were compared to replicates of controls (C1-C4), and a "mock" comparison, where controls (e.g. C1+C3) were compared to other controls (C2+C4). The most haunting result is shown below, where the "proper" comparison finds relatively few differentially expressed genes, while the "mock" comparison of controls versus other controls finds many, many more differentially expressed genes, and an increasing number with newer versions of cufflinks:

Table S1 from the DEXSeq paper.

This comparison predates the release of Cuffdiff 2, so perhaps this alarming trend ceases with the newer release of Cufflinks. However, it is worth noting that these data shown here are from a real dataset, where all the comparisons in the new Cuffdiff 2 paper were done with simulations. Having done some method development myself, I realize how easy it is to construct a simulation scenario to support nearly any claim you'd like to make.

FINAL THOUGHTS

Most RNA-seq folks would say that the field has a good handle on differential expression at the gene level, while differential expression at isoform-level resolution is still under development. I would tend to agree with this statement, but if cases as presented in Figure 1 of this paper are biologically important and widespread (they very well may be), then perhaps we have some re-thinking to do, even with what we thought were "simple" analyses at the gene level.

What's your workflow for RNA-seq analysis? Discuss.

24 comments:

AnonymousDecember 17, 2012 at 3:48 PM
Maybe I missed it, but I don't recall a comparison between cuffdiff 2 and the original cuffdiff in the paper. I would think that this is an "obvious" comparison. After all, if cuffdiff 2 warrants a Nature Biotech paper, then surely it is fundamentally different than the previously published cuffdiff?

As new as cuffdiff 2 is, there are no published papers that have used it, only the original cuffdiff. Which makes the new advances used in cuffdiff 2 and the issues they deal with bring into question a large body of papers based on the original cuffdiff.......

ReplyDelete
Replies
MikaelDecember 17, 2012 at 4:26 PM
Good post!

An aspect that was not mentioned in the paper was that edgeR and DESeq allow for complex factorial designs, which can be very useful for correcting for experimental batch effects, individual-specific differences etc., while Cuffdiff2 doesn't. This may in some cases be more important than the violations of assumptions that occur when doing e.g. DESeq on eXpress-derived isoform counts (by the way, they don't tell us what these violations are and how serious ... I am not knowledgeable enough to tell.) Therefore I would not throw out edgeR/DESeq from my toolbox even if Cuffdiff2 does work as advertised (which remains to be seen). I've started to run Cuffdiff2 and edgeR side by side on some projects and will try to get a feeling for how they work.
ReplyDelete
Replies
Stephen HendersonDecember 17, 2012 at 4:45 PM
These arguments over best method are a lot easier if you have a reference with some sort of well designed spike-ins. Can't we create a set of experiments with total human RNA and ectopic RNA spike-ins of differing isoforms in known conc?

Then use our methods to first fit the spike-in data back to an ectopic reference, and then the rest of the data with the usual ref and tests.

Hasn't this been done?
ReplyDelete
Replies
NicolasDecember 18, 2012 at 12:52 PM
As said on Twitter earlier this morning (https://twitter.com/notSoJunkDNA/status/281021341182861312), I did the comparison presented in the DEXSeq paper, for Cuffdiff2 (back in May, version 2.0.0). For C1C3 vs C2C4, I obtained 146 genes called significantly differently expressed. For C1C4 vs C2C3, I obtained 3 DE genes.
If I remember correctly, C1 and C3 are single-end reads and C2 and C4 are paired-end. Not sure if that plays a role in the difference here.
For T1-T3 vs C1-C4, I have 62 TE genes.

I don't have the values for the other comparison (T1T2 vs C2C3).

I feel that this big issue presented in the DEXSeq paper is better treated with Cuffdiff2, although not perfectly.

One improvement in Cuffdiff2 is also that it outputs some estimation of "read counts". Can that be used in DESeq/edgeR?
ReplyDelete
Replies
MicheleDecember 18, 2012 at 2:56 PM
This is a longer explanation:

If I understand these papers correctly my thoughts are that it is probably not a good idea to use counts from other sources in these programs.

EdgeR and DESeq use a model where they separate out the shot noise (aka counting noise, sampling noise, Poisson noise) that arises from the count nature of the data from the variance introduced by other types of noise (technical variance and biological variance). Then the extra variance is modeled either as uniform (as in the case of EdgeR - e.g. all biological/technical variances for all transcripts are set to the same over dispersion from Poisson) or as quasi-correlated with read depth for DESeq.

If you use counts with complex models from other sources you are introducing another, unmodelled source of variance into the count metric that arises from the estimate of where the reads that mapped to multiple isoforms really belong. This comes from the uncertainty of whether you are allocating the read to the right transcript.

I believe that this source of variance has to be correlated with the Poisson noise in the uniquely mapped reads no matter what approach you use because an assignment of a read that maps to multiple transcripts can really only be based on the information held in the reads that align uniquely to a single transcript. Thus, while counts go up with these reassigned reads the uncertainty in the measurements does not go down in the same way it would go down if you increased your counts through deeper sequencing. If that reallocation information is based on low numbers of uniquely aligned reads than the estimated final counts for the transcript including the reallocated multiply-mapped reads will have more uncertainty. Thus, the Poisson uncertainty in the uniquely aligned reads has to somehow propagate through to final measurements and the statistical approaches don't expect this.

So this may or may not screw up EdgeR and DESeq's variance calculations, and then their p-values.

My guess would be that because DESeq assumes that there is a correlation between read depth and extra-Poisson noise it might do a little better with these reads. EdgeR assumes no correlation (I think DESeq is a really great program but agree with EdgeR on this point based on the data that I've looked at). I would guess EdgeR would probably have a bias against calling transcripts with high read counts differentially expressed because it is observing more variance than it expects in the low count transcripts.

The p-values might still reflect the false positive rate correctly but that doesn't mean the results you get are the results you expect.

But you'd have to try it with simulations or better yet real data to know whether they break.

The results probably also depends on the individual experiment's true underlying distribution of variance, too, so there might not be a single "optimal" approach.

Anyway, realistically, you need an awful lot of reads to call differential transcript expression which is likely to be the limiting factor, and not the statistical approach.

That's my opinion anyway.
ReplyDelete
Replies
AnonymousDecember 18, 2012 at 4:04 PM
Nice post. I agree with your final thoughts on that we are not fully there to fully understand differential expression at isoform level yet. I have not read the paper yet, but it looks like one of the main points of the paper was to drive the point that cuffdiff 2 performs better than raw count based methods.

I am wondering whether the paper compared cuffdiff2 to EM based methods like RSEM & IsoEM, which uses the known isoforms to estimate isoform/gene expression levels.

I would think that for most RNA-Seq analysis interested in known isoforms the EM based approaches would perform equally well or better than cuffdiff 2. Here the assumption is that cuffdiff 2 gives more weight to data (as it infers isforms from data) rather than our prior knowledge about isoforms. And this assumption may not work well for low/average expressed/coverage RNA-Seq data.

nextgenseek.com
ReplyDelete
Replies
William NelsonDecember 20, 2012 at 10:22 AM
I'm not sure I would agree that the non-isoform DE problem has been usefully solved yet, given the drastically different numbers produced by the different programs, or the same program with different parameters. We've tested EdgeR, baySeq, and DegSeq, plus the normalizations in EDASeq, and quite often have seen one method give say 1E-50 while another method finds no DE. With EdgeR, the results depend sensitively on the dispersion parameter, which supposedly is being estimated by replicates, however the sensitive dependence raises a question in my mind as to how accurate the estimation really is (or indeed whether the negative binomial model fits the data well enough to enable such accuracy). Honestly, it's not clear to me that the various software packages are accomplishing more than one could by using intuitive thresholds, e.g. 2x fold-change with at least 50 reads, things like that. The p-values produced by the methods seem close to meaningless.
ReplyDelete
Replies
Stephen TurnerDecember 21, 2012 at 4:16 PM
This paper was recently featured in GenomeWeb (free "premium" access with a .edu address).
ReplyDelete
Replies
Stephen TurnerDecember 28, 2012 at 5:41 AM
Check out this thread on Seqanswers discussing all the different tools for transcriptome reconstruction and differential splicing, and leave a response if you've used any.

Scripture
Cufflinks
MMSEQ
Alexa-Seq
MISO
IsoLasso
CEM
cuffdiff 2
RSEM
FDM
IsoformEx
NSMAP
IQSEQ
DEXSeq
DSGSeq
IsoEM
IsoInfer
iReckon

This long list doesn't even consider tools like DESeq and edgeR using one of the alternative workflows mentioned above.
ReplyDelete
Replies
UnknownDecember 31, 2012 at 11:17 AM
I prefer the approach of DEXseq, it seems so much more straightforward to think about these things at an exon level rather than at a full transcript level, at least until longer paired end reads are the norm.
ReplyDelete
Replies
AnonymousMarch 22, 2013 at 8:26 AM
It looks like there is totally overlooked that there are two types of uncertainties when doing DEG in eukaryotes (that is that one has alternative splicing and threfore more isoforms/transcripts per gene).

If one uses, edgeR/DESeq/... methods then the lists of DE genes will contain also genes which actually are not differentially expressed but they are alternatively spliced diferrently (that means that in condition 1 one has the isoform A of gene X is expressed and in condition 2 one has that the isoform B of the same gene X is expressed where, for example, the length of isoform A is much shorter than isoform 2). That means in real life that if one goes to the wetlab to validate usinq qPCR those lists of DEG found using edgeR/DESeq then will be very surprised to find out that they do not get validated and actually those are laternatively spliced genes.

The alternative splicing events are quite common. For example it is known fact the the AR gene in prostate cancer expressed a different and shorter isoform when it is treated to anti-AR drugs!!! Here is more info about this:
Z. Guo. A New Trick of an Old Molecule: Androgen Receptor Splice Variants Taking the Stage?! Int J Biol Sci. 2011; 7(6): 815–822. ttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC3133889/

Most likely the edgeR/DEseq/... and other count-based methods would find in this case of prostate cancer that the AR gene is DE when in fact it is not (prostate cancer really need AR gene to be expressed)!
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

This blog has moved!

Monday, December 17, 2012

Differential Isoform Expression With RNA-Seq: Are We Really There Yet?

24 comments: