There were lots of interesting developments this month that didn't work their way into a full blog post. Here is an incomplete list of what
I've been tweeting about over the last few weeks. But first I want to draw your attention to the latest manuscript for a new bioconductor package for doing RNA-seq in R.
DEXSeq vs Cuffdiff. See the pre-publication manuscript from Simon Anders, Alejandro Reyes, and Wolfgang Huber: "
Detecting differential usage of exons from RNA-Seq data."
DEXSeq is an R package by the same guys who developed the
DESeq R package and the
HTSeq python scripts. (Incidentally, both DESeq and DEXSeq are rare examples of bioconductor vignettes which are well developed and are a pleasure to read). I often use cufflinks/cuffdiff in the bioinformatics core was because many other tools and methods only allow you to interrogate differential expression at the gene level. Using cufflinks for transcriptome assembly enables you to interrogate transcript/isoform expression, differential splicing, differential coding output, differential promoter usage, etc. DEXSeq uses similar methodology as DESeq, but can give you exon-level differential expression, without going through all the assembly business that cufflinks does. In one of the supplementary tables in their pre-pub manuscript, they compare several versions of cuffdiff to DEXSeq on two datasets. Both of these datasets had biological replicates for treatment and control conditions. They compared treatment to controls, and found DEXseq gave you more significant hits than cuffdiff. Then they compared controls to other controls (ideally should have zero hits) and found cufflinks had way more hits. See p13, p23, tables S1 and S2.
Proper comparison treatment vs control, # significant hits:
DEXSeq: 159
Cuffdiff 1.1: 145
Cuffdiff 1.2: 69
Cuffdiff 1.3: 50
Mock comparison controls vs controls, # significant hits:
DEXSeq: 8
Cuffdiff 1.1: 314
Cuffdiff 1.2: 650
Cuffdiff 1.3: 639
In the UVA Bioinformatics core we strive for reproducibility, scalability, and transparency using the most robust tools and methodology available. It gives me pause to see such alarmingly different results with each new version and each
new protocol of a particular tool. What are your thoughts and experiences with using Cufflinks/Cuffdiff, DESeq/DEXSeq, or the many, many other tools for RNA-Seq (
MISO,
ExpressionPlot,
EdgeR,
RSEM,
easyRNASeq, etc.)? Please share in the comments.
Everything else:
| Webinar from @goldenhelixinc: Learning From Our GWAS Mistakes: From experimental design to scientific method https://t.co/KkxAn18p |
| [bump] Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result http://t.co/iKMZ7Dsd #bioinformatics |
| Very cool: DNAse-Seq+RNA-seq used to show DNaseI sensitivity eQTLs are a major determinant of gene expression variation http://t.co/nPo3xHVa |
| Beware using UCSC GTFs in HTSeq/CovergeBed for counting RNA-seq reads. "transcript_id" is repeated as "gene_id"! https://t.co/ADg1Pi6U |
| Identification of allele-specific alternative mRNA processing via RNA-seq http://t.co/fig9cLlH #bioinformatics @myen |
| prepub on arXiv + analysis tutorial/walkthrough + AWS EC2 AMI + git repo + ipython notebook = reproducible research done right http://t.co/GPNmpdJD |
| NSF-NIH Interagency Initiative: Core Techniques and Technologies for Advancing Big Data Science and Engineering http://t.co/W3LUdCsG |
| New approach from @MarylynRitchie lab to collapsing/combining: using biological pathways rather than positional info http://t.co/ywWj0MNn |
| Cloud BioLinux: pre-configured and on-demand #bioinformatics computing for the genomics community. @myen http://t.co/3kCE0ktH |
| SCOTUS remands AMP v Myriad (BRCA) patent case to CAFC to consider in light of prometheus decision http://t.co/7CkTa4l0 |
| 57 year experiment, Drosophila kept in dark for 1400 generations, many evolutionary changes (record longest postdoc!) http://t.co/wukq8fAf |
| IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly http://t.co/FN1sYM8f #bioinformatics |
| Complex disease genetics is complex. Imagine that. Hirschhorn, Visscher, & the usual consortium suspects: http://t.co/Bwopxlx6 |
| MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets http://t.co/pffuHIlO #bioinformatics |
| Nat Protocols: Differential gene & transcript expression analysis of RNA-seq w/ TopHat & Cufflinks http://t.co/U1ZpSE7V #bioinformatics |
I work as a bioinformatician in a genomics core and I agree that it is somewhat terrifying to see results diverge so strongly not only between different software packages, but also between different releases of the same software (or in same cases, different runs of the same release). In many cases, there are good reasons for the discrepancies, but that doesn't help us core bioinformaticians who are more interested than most in providing reliable, reproducible results.
ReplyDeleteNowadays, we tend to use BaySeq or edgeR for RNA-seq DE analysis, mostly because these allow for flexible specification of the experimental design - you can set up things like biological and technical replicates, experimental batches, treatments etc. as separate factors in the model. Perhaps DESeq allows for this as well nowadays (it used to be that it didn't). We do not use Cuffdiff, both because we have seen several comparisons like the one you posted and because it's unclear how it tests for differential expression, while edgeR, BaySeq and DESeq are well documented. Also, these three are count based methods, which should give more statistical power as the FPKM (or RPKM) measure masks the difference between long genes with many mapped reads and short genes with few mapped reads, and presumably Cuffdiff works on FPKM - but I seem to recall having heard that it uses the counts as well somehow in the testing, but again, it's not documented how this is done.
By the way, you don't *have* to go through the assembly steps to use Cuffdiff, if you run Cufflinks with the -G flag to quantify using an annotation GTF file. Or did I misunderstand your post?
A slightly different issue is that different packages may be good at different kinds of experiments, e g the relatively new SAMSeq (a nonparametric method) seems to perform really well when you have many biological replicates, and it's possible that certain methods are better for lower/higher sequencing depth, short RNAs, etc.
In view of all these things, I am thinking of setting up some kind of automated testing system for RNA-seq DE analysis tools. You would be able to submit actual or synthetic data and get them analyzed by all available software packages and versions, and get information about the consistency between the output of different tools. Eventually, given statistics about many projects, this would also give some hints about which packages are better for which types of experiments. What do you think?
The fact that Cufflinks reports FPKM values does not mean that internally it masks the difference between long genes with many reads and short genes with few reads. My understanding is that Cufflinks doesn't just calculate the FPKM of each transcript, but also some measure of the uncertainty of that FPKM. So while it may be true that DEXSeq and other raw-count-based methods may have more statistical power than cuffdiff, I don't think you can automatically attribute any differences in statistical power to the use of counts vs FPKM. Or to put it another way, the transformation from counts to FPKM results in a loss of information (and a loss of statistical power), but the transformation from counts to FPKM estimate with uncertainty does not necessarily do so.
Deletecount-based methods have weaker statistical models because they do not consider how the length of a transcript affects DE!!!
DeleteMikael,
ReplyDeleteThanks for the very detailed feedback. Re: your last comment - if you were to implement such a testing framework I think you'll have no problem whatsoever getting this published. I think many in the field are in the same boat - similar to the early days of microarrays with lots of tools but few standards, something like this to be able to benchmark and compare tools to each other in an unbiased fashion is sorely needed!
It's interesting that Cufflinks calls so many hundreds of differentially expressed exons in the control-vs-control comparison, when the same version detected only 50 in the real comparison. Does this suggest that cuffdiff is somehow amplifying the noise in the absence of signal?
ReplyDeleteFirst if all I think that the comparison is slightly bias because there are three versions of CuffDiff and onky one version of DEXSeq (there should be 3 vs 3 versions).
ReplyDeleteAlso I bewildered how come it is considered that using raw counts is considered to give more statistical power than when using RPKM-like approach which uses a better statistical model (because it takes into consideration also the gene length)?
The way I see it is this:
raw counts => more statistical power but with a weak statistical model which totally ignores the gene length and the alternative splicing events
cuffdiff => weaker statistical but with a more powerful statiscal model which models the transcript length and the alternative splicing events
Packages like EdgrR, DeSeq, etc. are meant for DEGs but actually genes which are alternatively spliced will be found as DE even if they are not (e.g. same gene in 2 samples -> in one sample only two exons are expressed and in the second sample three exons of the same gene are expressed and the gene is expressed at the same level in both samples ==> this gene will be found DE by EdgeR even that is not just because EdheR ignores the length of the transcript). This is partially fixed in DEXSeq but still looks like a hack (we are doing RNA-seq and not Gene-seq and not exon-seq). We are measuring RNAs when we are doing RNA-seq! I think the for RNA-seq any statistical model should model the transcripts. CuffDiff is a step in the right direction and also BitSeq package (its authors show that actually the RPKM correlates better with ground truth expressions then raw counts; BitSeq works at transcript level ) s