Getting Genetics Done: DESeq vs edgeR Comparison

Tuesday, September 18, 2012

DESeq vs edgeR Comparison

Update (Dec 18, 2012): Please see this related post I wrote about differential isoform expression analysis with Cuffdiff 2.

DESeq and edgeR are two methods and R packages for analyzing quantitative readouts (in the form of counts) from high-throughput experiments such as RNA-seq or ChIP-seq. After alignment, reads are assigned to a feature, where each feature represents a target transcript, in the case of RNA-Seq, or a binding region, in the case of ChIP-Seq. An important summary statistic is the count of the number of reads in a feature (for RNA-Seq, this read count is a good approximation of transcript abundance).

Methods used to analyze array-based data assume a normally distributed, continuous response variable. However, response variables for digital methods like RNA-seq and ChIP-seq are discrete counts. Thus, both DESeq and edgeR methods are based on the negative binomial distribution.

I see these two tools often used interchangeably, and I wanted to take a look at how they stack up to one another in terms of performance, ease of use, and speed. This isn't meant to be a comprehensive evaluation or "bake-off" between the two methods. This would require complex simulations, parameter sweeps, and evaluation with multiple well-characterized real RNA-seq datasets. Further, this is only a start - a full evaluation would need to be much more comprehensive.

Here, I used the newest versions of both edgeR and DESeq, using the well-characterized Pasilla dataset, available in the pasilla Bioconductor package. The dataset is from an experiment in Drosophila investigating the effect of RNAi knockdown of the splicing factor, pasilla. I used the GLM functionality of both packages, as recommended by the vignettes, for dealing with a multifactorial experiment (condition: treated vs. untreated; library type: single-end and paired-end).

Both packages provide built-in functions for assessing overall similarity between samples using either PCA (DESeq) or MDS (edgeR), although these methods operate on the same underlying data and could easily be switched.

PCA plot on variance stabilized data from DESeq:

MDS plot from edgeR:

Per gene dispersion estimates from DESeq:

Biological coefficient of variation versus abundance (edgeR):

Now, let's see how many statistically significant (FDR<0.05) results each method returns:

In this simple example, DESeq finds 820 genes significantly differentially expressed at FDR<0.05, while edgeR is finds these 820 and an additional 371. Let's take a look at the detected fold changes from both methods:

Here, if genes were found differentially expressed by edgeR only, they're colored red; if found by both, colored green. What's striking here is that for a handful of genes, DESeq is (1) reporting massive fold changes, and (2) not calling them statistically significant. What's going on here?

It turns out that these genes have extremely low counts (usually one or two counts in only one or two samples). The DESeq vignette goes through the logic of independent filtering, showing that the likelihood of a gene being significantly differentially expressed is related to how strongly it's expressed, and advocates for discarding extremely lowly expressed genes, because differential expression is likely not statistically detectable.

Count-based filtering can be achieved two ways. The DESeq vignette demonstrates how to filter based on quantiles, while I used the filtering method demonstrated in the edgeR vignette - removing genes without at least 2 counts per million in at least two samples. This filtering code is commented out above - uncomment to filter.

After filtering, all of the genes shown above with apparently large fold changes as detected by DESeq are removed prior to filtering, and the fold changes correlate much better between the two methods. edgeR still detects ~50% more differentially expressed genes, and it's unclear to me (1) why this is the case, and (2) if this is necessarily a good thing.

Conclusions:

Unfortunately, I may have oversold the title here - this is such a cursory comparison of the two methods that I would hesitate to draw any conclusions about which method is better than the other. In addition to finding more significantly differentially expressed genes (again, not necessarily a good thing), I can say that edgeR was much faster than DESeq for fitting GLM models, but it took slightly longer to estimate the dispersion. Further without any independent filtering, edgeR gave me moderated fold changes for the extremely lowly expressed genes for which DESeq returned logFCs in the 20-30 range (but these transcripts were so lowly expressed anyway, they should have been filtered out before any evaluation).

If there's one thing that will make me use edgeR over DESeq (until I have time to do a more thorough evaluation), it's the fact that using edgeR seems much more natural than DESeq, especially if you're familiar with the limma package (pretty much the standard for analyzing microarray data and other continuously distributed gene expression data). Setting up the design matrix and specifying contrasts feels natural if you're familiar with using limma. Further, the edgeR user guide weighs in at 67 pages, filled with many case studies that will help you in putting together a design matrix for nearly any experimental design: paired designs, time courses, batch effects, interactions, etc. The DESeq documentation is still fantastic, but could benefit from a few more case studies / examples.

What do you think? Anyone want to fork my R code and help do this comparison more comprehensively (more examples, simulated data, speed benchmarking)? Is the analysis above fair? What do you find more easy to use, or is ease-of-use (and thus, reproducibility) even important when considering data analysis?

25 comments:

Stephen TurnerSeptember 18, 2012 at 3:27 PM
Following up this post - I really liked the analysis done in the DEXSeq supplemental comparing DEXSeq to cufflinks 1.1, 1.2, 1.3, etc. (DEXSeq is developed by some of the same folks as DESeq). Here the authors did a "proper" and "mock" comparison - where the mock comparison was between multiple replicates of the same control group. Ideally, a perfect method would find zero differentially expressed genes. I would love to see this kind of analysis done for newer versions of cufflinks (2.x), side-by-side with DESeq, edgeR, DEXSeq, and possibly others.
ReplyDelete
Replies
Brandon HurrSeptember 19, 2012 at 7:13 AM
Some of the functions aren't actually included in DESeq that you're using.
> plotDispEsts(d, main="DESeq: Per-gene dispersion estimates")
Error: could not find function "plotDispEsts"

>print(plotPCA(varianceStabilizingTransformation(d), intgroup=c("condition", "libType")))
Error in print(plotPCA(varianceStabilizingTransformation(d), intgroup = c("condition", :
could not find function "plotPCA"

I'm getting a lot of other errors too.

> efit <- glmLRT(efit, coef="conditiontreated")
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
dims [product 0] do not match the length of object [13]

> etable <- topTags(efit, n=nrow(e))$table
Error in abs(object$table$logFC) :
Non-numeric argument to mathematical function
ReplyDelete
Replies
JermdemoSeptember 19, 2012 at 8:15 AM
Great post. Thanks for the code.

It is a little odd that edgeR called every since DESeq gene. Usually there are some/many genes that are mutually exclusive (see the very recent "The bench scientist's guide to statistical analysis of RNA-Seq data" http://www.biomedcentral.com/content/pdf/1756-0500-5-506.pdf)

I find DESeq and edgeR equally easy to use but I also find equally difficult to comprehend. I'm not sure either has really distinguished itself. I have read Cuffdiff returns a radically different set of genes (or transcripts).

I think most people will just choose whatever package and parameters returns the best results for them using a faith-based methodology (i.e. the genes they know in their heart are differentially expressed)
ReplyDelete
Replies
chadSeptember 19, 2012 at 9:54 PM
There have been several more formal comparisons of DESeq and EdgeR in the literature. DESeq almost always seems to be the more conservative of the two in every comparison. I tend to see this as a good thing. I would rather live with more false negatives than false positives.

But in comparison to Cufflinks-CuffDiff, they are far more alike than different. The amount of variation from one version of Cufflinks to the next is rather disturbing. Furthermore, if you dig into the descriptions of CuffDiff, the authors actually say that they base their models on DESeq.....I think the problems with Cufflinks-CuffDiff have more to do with how they calculate the FPKM, an approach I have always found questionable and so have avoided. In contrast, while the authors of DESeq continue to add and improve it (and change the defaults) they maintain the original methods as options, whereas with Cufflinks, you have to go back to older versions to replicate results.

Sorry about the ranting more against Cufflinks than talking about DESeq and EdgeR. I just have developed a profound distrust for that program (love Bowtie and Tophat) but love DESeq....and EdgeR to a lesser extent ;)
ReplyDelete
Replies
chadSeptember 19, 2012 at 10:00 PM
Spike-ins....I recall a few papers earlier on doing this in RNA-seq, but it seems to be kind of buried and unused now. Frankly, that is one thing I wish I had done earlier on (will from now on) as spike-ins really would help assess results.

But then you still have RNA-seq papers regularly being published with no biological replicates, along with development of methods of assessing DE without replicates. This simply shocks me.
ReplyDelete
Replies
Stephen TurnerSeptember 20, 2012 at 5:46 AM
A comparison of DESeq and edgeR was just published, using simulated data:

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing
ReplyDelete
Replies
AnonymousSeptember 24, 2012 at 10:20 PM
Even more comprehensive comparison:

http://bib.oxfordjournals.org/content/early/2012/09/15/bib.bbs046.long
ReplyDelete
Replies
SethOctober 31, 2012 at 7:03 AM
This might be the wrong place to ask, but does anyone know how to get the genes that contribute most to the axes in the MDS plot in edgeR?
ReplyDelete
Replies
AnonymousDecember 7, 2012 at 2:28 PM
Has anyone used or tested BitSeq?
ReplyDelete
Replies
AnonymousDecember 10, 2012 at 7:24 AM
Majority of genes are alternatively spliced therefore one gene expresses several isoforms at once.
See figure http://massgenomics.org/wp-content/uploads/2012/10/gene-number-isoforms.jpg from paper: Gene isoforms expressed by the number annotated, ENCODE Nature 2012, Fig 4A!

Majority of genes found as differentially expressed by DESeq and edgeR are actually alternatively spliced differentially! Please, note that a differentially expression is different than differentially expressed.

CuffDiff and BitSeq takes alternatively splicing into consideration!
ReplyDelete
Replies
AnonymousMarch 5, 2013 at 6:59 PM
Regarding the PCA plot on variance stabilized data from DESeq, would it be possible to add the sample names on the graph??

If it were a another PCA generated by (affycoretools), i would use the text function, example " text( pca$x[,1], pca$x[,2], colnames(d), pos= 2 ) "

But I cant figure out a way to display sample names in the PCA plot of DESeq.... any ideas?

Many Thanks
Zaki
ReplyDelete
Replies
unygoMay 9, 2014 at 1:38 AM
Hello Stephen Turner
It is very nice posting!! great to learn!
I have one qustion regarding to the last part to compare the result from EdgeR and DESeq.

Since, you sort the etable and dtable based on adjusted pvalue. the row order should be different.
But if you just make table with this command,

addmargins(table(sig_edgeR=etable$FDR<0.05, sig_DESeq=dtable$padj<0.05))

do you think it actually find the overlapping one correctly?

When I tested with this one, I actually compare the overlapped one from EdgeR and DESeq.
It seems that the sig.genes are different from each other. What I mean is that

Suppose I have

> addmargins(table(sig_edgeR=etable$FDR<0.05, sig_DESeq=dtable$padj<0.05))
sig_DESeq
sig_edgeR FALSE TRUE Sum
FALSE 100 0 100
TRUE 5 6 11
Sum 105 6 111

When I looked at those 6 overalpped one, I expect all DEseq 6 significant genes are overlapped with edgeR.
But it is not!! .Do you have any idea?? DO I miss something?

THanks in advance
ReplyDelete
Replies
LindaApril 6, 2015 at 4:01 PM
Thank you very much for this post.
I am trying to do a similar comparison between edgeR and DeSeq2.
My question is about the last part of the code :

with(merged, plot(logFC, conditiontreated, xlab="logFC edgeR", ylab="logFC DESeq", pch=20, col="black", main="Fold change for DESeq vs edgeR"))

1) where did conditiontreated come from here?
2) why use conditiontreated instead of log2FoldChange

Thank you very much !!
Linda
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

This blog has moved!

Tuesday, September 18, 2012

DESeq vs edgeR Comparison

25 comments: