Friday, January 8, 2016

Repel overlapping text labels in ggplot2

A while back I showed you how to make volcano plots in base R for visualizing gene expression results. This is just one of many genome-scale plots where you might want to show all individual results but highlight or call out important results by labeling them, for example, with a gene name.
But if you want to annotate lots of points, the annotations usually get so crowded that they overlap one another and become illegible. There are ways around this - reducing the font size, or adjusting the position or angle of the text, but these usually don’t completely solve the problem, and can even make the visualization worse. Here’s the plot again, reading the results directly from GitHub, and drawing the plot with ggplot2 and geom_text out of the box.

# Load packages
library(dplyr)
library(ggplot2)
# Read data from the web
url = "https://gist.githubusercontent.com/stephenturner/806e31fce55a8b7175af/raw/1a507c4c3f9f1baaa3a69187223ff3d3050628d4/results.txt"
results = read.table(url, header=TRUE)
results = mutate(results, sig=ifelse(results$padj<0.05, "FDR<0.05", "Not Sig"))
p = ggplot(results, aes(log2FoldChange, -log10(pvalue))) +
geom_point(aes(col=sig)) +
scale_color_manual(values=c("red", "black"))
p
p+geom_text(data=filter(results, padj<0.05), aes(label=Gene))
view raw ggrepel1.r hosted with ❤ by GitHub




What a mess. It’s difficult to see what any of those downregulated genes are on the left. Enter the ggrepel package, a new extension of ggplot2 that repels text labels away from one another. Just sub in geom_text_repel() in place of geom_text() and the extension is smart enough to try to figure out how to label the points such that the labels don’t interfere with each other. Here it is in action.

# Install ggrepel package if needed
# install.packages("devtools")
# devtools::install_github("slowkow/ggrepel")
library(ggrepel)
p+geom_text_repel(data=filter(results, padj<0.05), aes(label=Gene))
view raw ggrepel2.r hosted with ❤ by GitHub


And the result (much better!):
See the ggrepel package vignette for more.
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.