Getting Genetics Done: qqman: an R package for creating Q-Q and manhattan plots from GWAS results

Thursday, May 15, 2014

qqman: an R package for creating Q-Q and manhattan plots from GWAS results

Three years ago I wrote a blog post on how to create manhattan plots in R. After hundreds of comments pointing out bugs and other issues, I've finally cleaned up this code and turned it into an R package.

The qqman R package is on CRAN: http://cran.r-project.org/web/packages/qqman/

The source code is on GitHub: https://github.com/stephenturner/qqman

If you'd like to cite the qqman package (appreciated but not required), please cite this pre-print: Turner, S.D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. biorXiv DOI: 10.1101/005165 (2014).

Something wrong? Please file bug reports, feature requests, or anything else related to the code as an issue on GitHub rather than commenting here. Also, please post the code you're using and some example data causing a failure in a publicly accessible place, such as a GitHub gist (no registration required). It's difficult to troubleshoot if I can't see the data where the code is failing. Want to contribute? Awesome! Send me a pull request.

Note: This release is substantially simplified for the sake of maintainability and creating an R package. The old code that allows confidence intervals on the Q-Q plot and allows more flexible annotation and highlighting is still available at the version 0.0.0 release on GitHub.

Here's a shout-out to all the blog commenters on the previous post for pointing out bugs and other issues, and a special thanks to Dan Capurso and Tim Knutsen for useful contributions and bugfixes.

qqman package tutorial

First things first. Install the package (do this only once), then load the package (every time you start a new R session)

# only once:
install.packages("qqman")

# each time:
library(qqman)

You can access this help any time from within R by accessing the vignette:

vignette("qqman")

The qqman package includes functions for creating manhattan plots and q-q plots from GWAS results. The gwasResults data.frame included with the package has simulated results for 16,470 SNPs on 22 chromosomes. Take a look at the data:

str(gwasResults)

'data.frame':   16470 obs. of  4 variables:
 $ SNP: chr  "rs1" "rs2" "rs3" "rs4" ...
 $ CHR: int  1 1 1 1 1 1 1 1 1 1 ...
 $ BP : int  1 2 3 4 5 6 7 8 9 10 ...
 $ P  : num  0.915 0.937 0.286 0.83 0.642 ...

head(gwasResults)

  SNP CHR BP      P
1 rs1   1  1 0.9148
2 rs2   1  2 0.9371
3 rs3   1  3 0.2861
4 rs4   1  4 0.8304
5 rs5   1  5 0.6417
6 rs6   1  6 0.5191

tail(gwasResults)

          SNP CHR  BP      P
16465 rs16465  22 530 0.5644
16466 rs16466  22 531 0.1383
16467 rs16467  22 532 0.3937
16468 rs16468  22 533 0.1779
16469 rs16469  22 534 0.2393
16470 rs16470  22 535 0.2630

How many SNPs on each chromosome?

as.data.frame(table(gwasResults$CHR))

   Var1 Freq
1     1 1500
2     2 1191
3     3 1040
4     4  945
5     5  877
6     6  825
7     7  784
8     8  750
9     9  721
10   10  696
11   11  674
12   12  655
13   13  638
14   14  622
15   15  608
16   16  595
17   17  583
18   18  572
19   19  562
20   20  553
21   21  544
22   22  535

Creating manhattan plots

Now, let's make a basic manhattan plot.

manhattan(gwasResults)

We can also pass in other graphical parameters. Let's add a title (main=), reduce the point size to 50%(cex=), and reduce the font size of the axis labels to 80% (cex.axis=):

manhattan(gwasResults, main = "Manhattan Plot", cex = 0.5, cex.axis = 0.8)

Let's change the colors and increase the maximum y-axis:

manhattan(gwasResults, col = c("blue4", "orange3"), ymax = 12)

Let's remove the suggestive and genome-wide significance lines:

manhattan(gwasResults, suggestiveline = F, genomewideline = F)

Let's look at a single chromosome:

manhattan(subset(gwasResults, CHR == 1))

Let's highlight some SNPs of interest on chromosome 3. The 100 SNPs we're highlighting here are in a character vector called snpsOfInterest. You'll get a warning if you try to highlight SNPs that don't exist.

str(snpsOfInterest)

 chr [1:100] "rs3001" "rs3002" "rs3003" "rs3004" "rs3005" ...

manhattan(gwasResults, highlight = snpsOfInterest)

We can combine highlighting and limiting to a single chromosome:

manhattan(subset(gwasResults, CHR == 3), highlight = snpsOfInterest, main = "Chr 3")

A few notes on creating manhattan plots:

Run str(gwasResults). Notice that the gwasResults data.frame has SNP, chromosome, position, and p-value columns named SNP, CHR, BP, and P. If you're creating a manhattan plot and your column names are different, you'll have to pass the column names to the chr=, bp=, p=, and snp= arguments. See help(manhattan) for details.
The chromosome column must be numeric. If you have “X,” “Y,” or “MT” chromosomes, you'll need to rename these 23, 24, 25, etc.
If you'd like to change the color of the highlight or the suggestive/genomewide lines, you'll need to modify the source code. Search for col="blue", col="red", or col="green3" to modify the suggestive line, genomewide line, and highlight colors, respectively.

Creating Q-Q plots

Creating Q-Q plots is straightforward - simply supply a vector of p-values to the qq() function. You can optionally provide a title.

qq(gwasResults$P, main = "Q-Q plot of GWAS p-values")

59 comments:

AnonymousMay 15, 2014 at 5:21 PM
Great work Stephen!
Really like the updated source code with friendly comments.
For my own work I'll try to implement colouring of the SNPs based on LD with the leading SNP.

Tim
ReplyDelete
Replies
jesse HoffMay 28, 2014 at 9:44 AM
This is a really useful form of your previous work. One thing that many might like tweaked and is good for some low level users is an argument to not -logten transform the data. There are many types of genome wide results that this function is great for. it was pretty easy to tweek the function myself, but not everyone is comfortable doing that. Thanks!
ReplyDelete
Replies
hatun kisiJune 12, 2014 at 1:04 AM
Hi Stephen,
Thanks a lot for sharing this. Is there any information that you know about the default resolution of the plots?
ReplyDelete
Replies
bruceJuly 3, 2014 at 9:30 AM
Hi Stephen,

I am trying to bastardise your fine work above to plot differences in alt alleles in SNPs between phenotypes (rightly or wrongly...). I had a look at the code but it wasn't immediately apparent where you plot and colour CHR so as to be able to visually discriminate between each in turn. I am a little short on time so thought I would ask if you could point out to me? Sorry for not RTFMing=P

Bruce.
ReplyDelete
Replies
morsJuly 10, 2014 at 7:45 AM
Hi Stephen,

Thanks for the useful code and package. Sorry for the simple question - but how do I go about calculating the lambda and adding to the plot ?

Thanks in advance,

Luanna
ReplyDelete
Replies
AthenaJuly 25, 2014 at 6:05 AM
Brilliant, just in time to cite it in a paper :D
ReplyDelete
Replies
UnknownJuly 29, 2014 at 6:55 PM
Hi Stephen,

Is there an option to modify the width of the points?, I looked into the code and it says pch= 20 but I don't know how to change it, I tried to enter "pch = 10 " in the function but an error appeared saying that "pch" matched by multiple actual arguments.

Berna
ReplyDelete
Replies
UnknownJuly 30, 2014 at 10:15 PM
Hi Stephen

Some chromosome may have blank gaps as the SNPs within some region are not genotyped, you need to address that in your script...
ReplyDelete
Replies
UnknownAugust 6, 2014 at 10:19 AM
Hi Stephen,

Thanks for such a beautiful script, and for such clear commenting! I am wondering if there is a way to alter the source code so that the ticks demarcating chromosomes fall at the outer boundaries of the chromosomes, rather than the middle, and the chromosome numbers fall in the middle? In the plots I am preparing I am not alternating colours of the chromosomes, and it is difficult to discern the boundaries between chromosomes with the ticks falling in the centre. I have tried fiddling around with the source code, but I am still fairly new to R and have not had any success.

Jennie
ReplyDelete
Replies
MilesSeptember 22, 2014 at 12:14 AM
Hi Stephen,

Thanks for the useful package - I have been hacking the GenABEL plot function on and off over the years and yours is really nice and clean.

In one of my OCD moments I noticed that the x axes tick markers weren't centered on the chromosomes, so I went digging through the source. It seems that you've made an assumption that SNPs are evenly distributed across chromosomes, which is probably correct 99.9% of the time. However a student in our lab was interested in a subset of ~16K SNPs in specific protein coding genes, which don't follow an even distribution (obviously a very specific use case).

I believe that a small change to line 127;
from:

ticks=c(ticks, d[d$index==i, ]$pos[floor(length(d[d$index==i, ]$pos)/2)+1])
# this takes the middle value of the number of observed positions, which isn't always going to be the 'mean' chromosomal position

to:

ticks = c(ticks, (min(d[d$CHR == i,]$pos) + max(d[d$CHR == i,]$pos))/2 + 1)
# this modification should identify the 'true' mean chromosomal position

This fixes the issue with various data I've run through. I don't imagine the modification would cause issues down the track... but don't quote me on that!

Please disregard if I'm talking nonsense!

- Miles
ReplyDelete
Replies
RosadosaOctober 15, 2014 at 3:37 AM
Hi Stephen,

thanks for the great package!
I'm using it to plot my GWAS data and I was wondering if there is a command to add the rs numbers to SNPs that are above a certain threshold. (in my case the p=10^-5 line)
I have a basic understanding of R, but I cannot do any programming myself.
Could you help my with this?
Thanks in advance,

Roos
ReplyDelete
Replies
AnonymousOctober 15, 2014 at 4:19 PM
Hi Stephan,
Very nice. I found the single chromosome results came out smashed on a vertical line.
This seems to be because "d$pos=d$BP/1e6" in manhattan.R (the x axis is in megabytes, even if x limits have been specified)
I know you have highlighted ggplot2 in the past. Any reason for not using it here?
Thanks,
Aaron
ReplyDelete
Replies
jmtoungNovember 13, 2014 at 1:18 AM
Hi Stephan,

For some reason, when I try to change the colors for my plot, I get the following error. Do you know why?

manhattan(data_test, ymax=8, col=c("blue4","orange3"))

Error in localWindow(xlim, ylim, log, asp, ...) :
formal argument "col" matched by multiple actual arguments
ReplyDelete
Replies
RMSJanuary 15, 2015 at 2:21 PM
Hi Stephen,

Thank you for creating such useful code! I am trying to plot Likelihood values from a non-human species. I am finding that the option "chrlabs=c(1:38)" is not changing the chromosome labels on my plot. It also does not work if I use the option "limitchromosomes=1:38". Also, I am using the logp=FALSE flag since I want to plot just the likelihood values. There is no difference in my plot if I use logp=FALSE or logp=TRUE and the log10 values are being plotted in both cases.

Rena
ReplyDelete
Replies
UnknownJanuary 23, 2015 at 10:13 AM
Hi Stephen,

Thank you for creating usefull tool. I am new to R and manhattan plot is my first assignment. I am trying to adjust y axis for plot and its giving me warning " In plot.xy(xy.coords(x, y), type = type, ...) :
"ymin" is not a graphical parameter"

How do i adjust Y axis ?
Thank you,
Lata
ReplyDelete
Replies
UnknownMarch 26, 2015 at 8:07 AM
Hi Stephen,

Useful package! Maybe you can add an option to the manhattan plot to show the rsid's of the highlighted SNPs (at least if there not too much SNPs). For example add:

with(d.highlight, text(pos, logp, SNP, col = "green3",
...))

Or use an additional argument to the manhattan function.

Cheers,
Maarten
ReplyDelete
Replies
Guillermo FerrerMay 5, 2015 at 5:44 AM
Thanks a lot!!!!

This it's going to be very usefull to me (and easier that using the entire code)
ReplyDelete
Replies
UnknownSeptember 11, 2015 at 7:34 AM
Hi Stephen,
Thanks for this useful package.
I have a persistent error "Error in plot.window(...) : need finite 'ylim' values" which I cant seem to eliminate.
Some of my previous examples worked fine.

ReplyDelete
Replies
JackOctober 15, 2015 at 11:46 AM
Thanks for a very useful package. I have a naive question in that I am not entirely certain what the convention is for a suggestive threshold (maybe because I work with plants?). Any comments or direction to literature would be appreciated.
ReplyDelete
Replies
BárbaraOctober 19, 2015 at 5:11 PM
This comment has been removed by the author.
ReplyDelete
Replies
xyzOctober 22, 2015 at 9:56 PM
Hi. while trying qqman I'm running into the following error "Error in plot.window(...) : need finite 'ylim' values". I don't have any NAs in my data. Is there something i'm doing wrong here?
ReplyDelete
Replies
UnknownOctober 23, 2015 at 12:19 PM
Hello Stephen,
Thank you for this useful package (I'm really happy with mine manhattan plots). =)
I want to know if is possible to create a qq plot with multiples pvalue vectors? I mean, I'm trying to make a plot to compare different models for association analysis similar to the one in: http://www.sciencedirect.com/science/article/pii/S0168945212001628#fig0020
Thanks!!!

ReplyDelete
Replies
xyzOctober 30, 2015 at 3:57 AM
It works fine with this small data. But I can't see any highlighted SNPs with a bigger data file. Does it have something to do with the P-value or do i need to scale the graph? In the attached file "Highlight.txt" I can only see the top 5 SNPs in the manhattan plot and not the bottom 5.
https://gist.github.com/4cea1b9ba9bf3081a065.git
ReplyDelete
Replies
KellyNovember 16, 2015 at 11:38 PM
This is a great package, thank you so much. I'm just wondering if there is a way to annotate the genome-wide significant probes/SNPs with specific data from another column in my dataframe (for example, closest gene or rsID)?
Many thanks
ReplyDelete
Replies
Stephen TurnerJune 9, 2016 at 12:52 PM
I'd recommend everyone go over and look at manhattanly - an R package based on qqman that allows for interactive manhattan plots. http://moderndata.plot.ly/manhattanly-r-package-for-interactive-manhattan-plots/
ReplyDelete
Replies
hexheadJune 21, 2016 at 6:54 PM
Nice little package Stephen! Can't believe this is my first time doing these plots after so many bioinformatics years, but this made the intro easy.
ReplyDelete
Replies
Yuri NatividadeFebruary 2, 2017 at 2:30 PM
Hello Stephen I liked your package so much, but I found a problem when I use the manhattan(subset(gwasResults, CHR == 1)). I would like to now if when I am working with the subset of a CHR, I cant use the option to use another variable in the place of P-Value. Like in the
manhattan(gwasResults, p = "zscore", logp = FALSE, ylab = "Z-score", genomewideline = FALSE, suggestiveline = FALSE, main = "Manhattan plot of Z-scores").
Thanks in advance!
ReplyDelete
Replies