Friday, January 8, 2016

Repel overlapping text labels in ggplot2

A while back I showed you how to make volcano plots in base R for visualizing gene expression results. This is just one of many genome-scale plots where you might want to show all individual results but highlight or call out important results by labeling them, for example, with a gene name.
But if you want to annotate lots of points, the annotations usually get so crowded that they overlap one another and become illegible. There are ways around this - reducing the font size, or adjusting the position or angle of the text, but these usually don’t completely solve the problem, and can even make the visualization worse. Here’s the plot again, reading the results directly from GitHub, and drawing the plot with ggplot2 and geom_text out of the box.





What a mess. It’s difficult to see what any of those downregulated genes are on the left. Enter the ggrepel package, a new extension of ggplot2 that repels text labels away from one another. Just sub in geom_text_repel() in place of geom_text() and the extension is smart enough to try to figure out how to label the points such that the labels don’t interfere with each other. Here it is in action.



And the result (much better!):
See the ggrepel package vignette for more.

8 comments:

  1. Very helpful, thanks! Overlapping labels have always been a problem, and playing with "nudge_x/y", "check_overlap", "h/vjust" and the like was suboptimal at best. "geom_label_repel" from the ggrepel package is also nice.

    ReplyDelete
  2. Thanks for the post, Stephen! I had exactly this use case in mind when I developed ggrepel.

    ReplyDelete
  3. Thank you for the post. unfortunately, like many other packages, i get "package ‘ggrepel’ is not available (for R version 3.1.2)" . When I update R to higher versions, I lose even more packages . Not sure if there is a way around this. so far, i did not find one.

    ReplyDelete
    Replies
    1. What do you mean "lose packages?" You'll just need to reinstall them.

      Delete
  4. My guess is you're trying to install the package from a CRAN repo, but I do not believe it is available via repo at this point in time. You'll have to install it from source using the devtools package as was in the comments above.

    install.packages("devtools")
    devtools::install_github("slowkow/ggrepel")

    ReplyDelete
    Replies
    1. Looks like it just hit CRAN https://cran.r-project.org/web/packages/ggrepel/index.html

      Delete
  5. Dear Stephen, thank you so much for all your help, I am a beginner in R and I managed to get nice volcano plots for my RNAseq results thanks to you!
    I have followed all your steps but in my case I only want to label a subset of the significant genes in red, not all of them. How could I do that?

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.