The qqman R package is on CRAN: http://cran.r-project.org/web/packages/qqman/
The source code is on GitHub: https://github.com/stephenturner/qqman
If you'd like to cite the qqman package (appreciated but not required), please cite this pre-print: Turner, S.D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. biorXiv DOI: 10.1101/005165 (2014).
Something wrong? Please file bug reports, feature requests, or anything else related to the code as an issue on GitHub rather than commenting here. Also, please post the code you're using and some example data causing a failure in a publicly accessible place, such as a GitHub gist (no registration required). It's difficult to troubleshoot if I can't see the data where the code is failing. Want to contribute? Awesome! Send me a pull request.
Note: This release is substantially simplified for the sake of maintainability and creating an R package. The old code that allows confidence intervals on the Q-Q plot and allows more flexible annotation and highlighting is still available at the version 0.0.0 release on GitHub.
Here's a shout-out to all the blog commenters on the previous post for pointing out bugs and other issues, and a special thanks to Dan Capurso and Tim Knutsen for useful contributions and bugfixes.
qqman package tutorial
First things first. Install the package (do this only once), then load the package (every time you start a new R session)# only once:
install.packages("qqman")
# each time:
library(qqman)
You can access this help any time from within R by accessing the vignette:vignette("qqman")
The qqman package includes functions for creating manhattan plots and q-q plots from GWAS results. The gwasResults
data.frame included with the package has simulated results for 16,470 SNPs on 22 chromosomes. Take a look at the data:str(gwasResults)
'data.frame': 16470 obs. of 4 variables:
$ SNP: chr "rs1" "rs2" "rs3" "rs4" ...
$ CHR: int 1 1 1 1 1 1 1 1 1 1 ...
$ BP : int 1 2 3 4 5 6 7 8 9 10 ...
$ P : num 0.915 0.937 0.286 0.83 0.642 ...
head(gwasResults)
SNP CHR BP P
1 rs1 1 1 0.9148
2 rs2 1 2 0.9371
3 rs3 1 3 0.2861
4 rs4 1 4 0.8304
5 rs5 1 5 0.6417
6 rs6 1 6 0.5191
tail(gwasResults)
SNP CHR BP P
16465 rs16465 22 530 0.5644
16466 rs16466 22 531 0.1383
16467 rs16467 22 532 0.3937
16468 rs16468 22 533 0.1779
16469 rs16469 22 534 0.2393
16470 rs16470 22 535 0.2630
How many SNPs on each chromosome?as.data.frame(table(gwasResults$CHR))
Var1 Freq
1 1 1500
2 2 1191
3 3 1040
4 4 945
5 5 877
6 6 825
7 7 784
8 8 750
9 9 721
10 10 696
11 11 674
12 12 655
13 13 638
14 14 622
15 15 608
16 16 595
17 17 583
18 18 572
19 19 562
20 20 553
21 21 544
22 22 535
Creating manhattan plots
Now, let's make a basic manhattan plot.manhattan(gwasResults)
We can also pass in other graphical parameters. Let's add a title (
main=
), reduce the point size to 50%(cex=
), and reduce the font size of the axis labels to 80% (cex.axis=
):manhattan(gwasResults, main = "Manhattan Plot", cex = 0.5, cex.axis = 0.8)
Let's change the colors and increase the maximum y-axis:
manhattan(gwasResults, col = c("blue4", "orange3"), ymax = 12)
Let's remove the suggestive and genome-wide significance lines:
manhattan(gwasResults, suggestiveline = F, genomewideline = F)
Let's look at a single chromosome:
manhattan(subset(gwasResults, CHR == 1))
Let's highlight some SNPs of interest on chromosome 3. The 100 SNPs we're highlighting here are in a character vector called
snpsOfInterest
. You'll get a warning if you try to highlight SNPs that don't exist.str(snpsOfInterest)
chr [1:100] "rs3001" "rs3002" "rs3003" "rs3004" "rs3005" ...
manhattan(gwasResults, highlight = snpsOfInterest)
We can combine highlighting and limiting to a single chromosome:
manhattan(subset(gwasResults, CHR == 3), highlight = snpsOfInterest, main = "Chr 3")
A few notes on creating manhattan plots:
- Run
str(gwasResults)
. Notice that thegwasResults
data.frame has SNP, chromosome, position, and p-value columns namedSNP
,CHR
,BP
, andP
. If you're creating a manhattan plot and your column names are different, you'll have to pass the column names to thechr=
,bp=
,p=
, andsnp=
arguments. Seehelp(manhattan)
for details. - The chromosome column must be numeric. If you have “X,” “Y,” or “MT” chromosomes, you'll need to rename these 23, 24, 25, etc.
- If you'd like to change the color of the highlight or the suggestive/genomewide lines, you'll need to modify the source code. Search for
col="blue"
,col="red"
, orcol="green3"
to modify the suggestive line, genomewide line, and highlight colors, respectively.
Creating Q-Q plots
Creating Q-Q plots is straightforward - simply supply a vector of p-values to theqq()
function. You can optionally provide a title.qq(gwasResults$P, main = "Q-Q plot of GWAS p-values")
Great work Stephen!
ReplyDeleteReally like the updated source code with friendly comments.
For my own work I'll try to implement colouring of the SNPs based on LD with the leading SNP.
Tim
Great - and don't hesitate to send a PR if you do. And I fully intend on re-integrating your and Dan's contributions - I just had to simplify code to the point where I felt comfortable documenting and maintaining before I could release as a package.
DeleteThis is a really useful form of your previous work. One thing that many might like tweaked and is good for some low level users is an argument to not -logten transform the data. There are many types of genome wide results that this function is great for. it was pretty easy to tweek the function myself, but not everyone is comfortable doing that. Thanks!
ReplyDeleteGood call. Looking at raw p-values in a manhattan plot isn't very useful, but looking at other types of scores, e.g., test statistics, peak heights, bayes factors, etc. might be useful when not -log10 transformed. Would you mind submitting this as an issue on GitHub? Thanks.
DeleteThis issue is now resolved on the dev branch on github.
DeleteHi Stephen,
ReplyDeleteThanks a lot for sharing this. Is there any information that you know about the default resolution of the plots?
That's controlled with the options used for saving the image
Deletepng("file.png", width=2000, height=1000, pointsize=18, ...)
manhattan(...)
dev.off()
Hi Stephen,
ReplyDeleteI am trying to bastardise your fine work above to plot differences in alt alleles in SNPs between phenotypes (rightly or wrongly...). I had a look at the code but it wasn't immediately apparent where you plot and colour CHR so as to be able to visually discriminate between each in turn. I am a little short on time so thought I would ask if you could point out to me? Sorry for not RTFMing=P
Bruce.
Bruce, take a look at the source code. I believe you'll need to mod lines 149 and 156-160. Basically, I'm creating a vector of color choices longer than I'll need on line 149, then alternating through them with the for loop. Inelegant, but it works.
DeleteGreat, thanks for that Stephen.
DeleteHi Stephen,
ReplyDeleteThanks for the useful code and package. Sorry for the simple question - but how do I go about calculating the lambda and adding to the plot ?
Thanks in advance,
Luanna
Brilliant, just in time to cite it in a paper :D
ReplyDeleteHi Stephen,
ReplyDeleteIs there an option to modify the width of the points?, I looked into the code and it says pch= 20 but I don't know how to change it, I tried to enter "pch = 10 " in the function but an error appeared saying that "pch" matched by multiple actual arguments.
Berna
Can you post this question as an issue on the github repo please?
DeleteHi Stephen
ReplyDeleteSome chromosome may have blank gaps as the SNPs within some region are not genotyped, you need to address that in your script...
Yuan: how about posting some example data as a github gist on the qqman issues page? Thanks.
DeleteHi Stephen,
ReplyDeleteThanks for such a beautiful script, and for such clear commenting! I am wondering if there is a way to alter the source code so that the ticks demarcating chromosomes fall at the outer boundaries of the chromosomes, rather than the middle, and the chromosome numbers fall in the middle? In the plots I am preparing I am not alternating colours of the chromosomes, and it is difficult to discern the boundaries between chromosomes with the ticks falling in the centre. I have tried fiddling around with the source code, but I am still fairly new to R and have not had any success.
Jennie
Hi Jennie. I believe it's line 127 in commit ccb08c that you want to change.
DeleteHi Stephen,
ReplyDeleteThanks for the useful package - I have been hacking the GenABEL plot function on and off over the years and yours is really nice and clean.
In one of my OCD moments I noticed that the x axes tick markers weren't centered on the chromosomes, so I went digging through the source. It seems that you've made an assumption that SNPs are evenly distributed across chromosomes, which is probably correct 99.9% of the time. However a student in our lab was interested in a subset of ~16K SNPs in specific protein coding genes, which don't follow an even distribution (obviously a very specific use case).
I believe that a small change to line 127;
from:
ticks=c(ticks, d[d$index==i, ]$pos[floor(length(d[d$index==i, ]$pos)/2)+1])
# this takes the middle value of the number of observed positions, which isn't always going to be the 'mean' chromosomal position
to:
ticks = c(ticks, (min(d[d$CHR == i,]$pos) + max(d[d$CHR == i,]$pos))/2 + 1)
# this modification should identify the 'true' mean chromosomal position
This fixes the issue with various data I've run through. I don't imagine the modification would cause issues down the track... but don't quote me on that!
Please disregard if I'm talking nonsense!
- Miles
Miles, thanks so much for the fix. I haven't tested it out yet. Would you mind uploading some sample data to gist.github.com so I can test this out to see what you're dealing with? Or you can just email me if it's not too large. I'd love to include this in the next update. Thanks.
DeleteNo problem Stephen. I have sent through some example data via email, I hope it is useful. Please let me know if you would like me to test anything.
DeleteHi Stephen,
ReplyDeletethanks for the great package!
I'm using it to plot my GWAS data and I was wondering if there is a command to add the rs numbers to SNPs that are above a certain threshold. (in my case the p=10^-5 line)
I have a basic understanding of R, but I cannot do any programming myself.
Could you help my with this?
Thanks in advance,
Roos
Roos - this is literally next on my to-do list with this code. I'll post here when it's updated. Thanks.
DeleteThat would be great!!
DeleteLooking forward to it.
Thanks!
Did you succeed?
DeleteVersion 0.1.3 now has the ability to highlight SNPs below a particular threshold or annotate top SNPs on each chromosome. The release is already live on GitHub, and should propagate to CRAN within a day or two.
DeleteIs there a way to increase the point size of just the highlighted SNPs? They're slightly obscured when plotting millions of SNPs.
DeleteHi Stephan,
ReplyDeleteVery nice. I found the single chromosome results came out smashed on a vertical line.
This seems to be because "d$pos=d$BP/1e6" in manhattan.R (the x axis is in megabytes, even if x limits have been specified)
I know you have highlighted ggplot2 in the past. Any reason for not using it here?
Thanks,
Aaron
Hm, didn't consider that when merging this PR. Will revisit.
DeleteAny update on this? It still displays single chromosomes in a vertical line due to the position on the x axis being in Mb.
DeleteHi Stephan,
ReplyDeleteFor some reason, when I try to change the colors for my plot, I get the following error. Do you know why?
manhattan(data_test, ymax=8, col=c("blue4","orange3"))
Error in localWindow(xlim, ylim, log, asp, ...) :
formal argument "col" matched by multiple actual arguments
hi @jmtoung, please post this as an issue on the issue tracker on GitHub. Thanks.
DeleteHi Stephen,
ReplyDeleteThank you for creating such useful code! I am trying to plot Likelihood values from a non-human species. I am finding that the option "chrlabs=c(1:38)" is not changing the chromosome labels on my plot. It also does not work if I use the option "limitchromosomes=1:38". Also, I am using the logp=FALSE flag since I want to plot just the likelihood values. There is no difference in my plot if I use logp=FALSE or logp=TRUE and the log10 values are being plotted in both cases.
Rena
See the conversation on GitHub
DeleteHi Stephen,
ReplyDeleteThank you for creating usefull tool. I am new to R and manhattan plot is my first assignment. I am trying to adjust y axis for plot and its giving me warning " In plot.xy(xy.coords(x, y), type = type, ...) :
"ymin" is not a graphical parameter"
How do i adjust Y axis ?
Thank you,
Lata
it's ylim.
DeleteDear Stephen, many thanks for a useful and practical package, but the ylim=c(a,b) seems to be unresponsive to any values I feed it?
Deletemany thanks, Isi
Please post your example data and code as an issue on the github issue tracker. Thanks.
DeleteHi Stephen,
ReplyDeleteUseful package! Maybe you can add an option to the manhattan plot to show the rsid's of the highlighted SNPs (at least if there not too much SNPs). For example add:
with(d.highlight, text(pos, logp, SNP, col = "green3",
...))
Or use an additional argument to the manhattan function.
Cheers,
Maarten
Thanks a lot!!!!
ReplyDeleteThis it's going to be very usefull to me (and easier that using the entire code)
Hi Stephen,
ReplyDeleteThanks for this useful package.
I have a persistent error "Error in plot.window(...) : need finite 'ylim' values" which I cant seem to eliminate.
Some of my previous examples worked fine.
open an issue on github and leave some example code and data as a github gist.
DeleteI am having the same issue with my dataset. My dataset is an imputed list of snps and I consistently get the Error in plot.window(...) : need finite 'ylim' values -- error message
Deletealso using ymax = 10 I get the following response from R In plot.window(...) : "ymax" is not a graphical parameter
I can't see where the person has placed the problem on github
Please open an issue on github and leave a reproducible example as a github gist, with data.
DeleteFredrick and "Unknown"
DeleteI had similar issues which I figured out.
The Error in plot window.... is because you probably have NAs in your data.
Solution: Get rid of them with >results <- na.omit(results)
"ymax" error
Solution: use "ylim" not "ymax"
Thanks for a very useful package. I have a naive question in that I am not entirely certain what the convention is for a suggestive threshold (maybe because I work with plants?). Any comments or direction to literature would be appreciated.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteHi. while trying qqman I'm running into the following error "Error in plot.window(...) : need finite 'ylim' values". I don't have any NAs in my data. Is there something i'm doing wrong here?
ReplyDeletePlease open an issue on github and leave a reproducible example as a github gist, with data.
DeleteHello Stephen,
ReplyDeleteThank you for this useful package (I'm really happy with mine manhattan plots). =)
I want to know if is possible to create a qq plot with multiples pvalue vectors? I mean, I'm trying to make a plot to compare different models for association analysis similar to the one in: http://www.sciencedirect.com/science/article/pii/S0168945212001628#fig0020
Thanks!!!
Pamela, certainly possible, but would require some retooling. The easiest way to do this now would be to use ggplot2.
DeleteIt works fine with this small data. But I can't see any highlighted SNPs with a bigger data file. Does it have something to do with the P-value or do i need to scale the graph? In the attached file "Highlight.txt" I can only see the top 5 SNPs in the manhattan plot and not the bottom 5.
ReplyDeletehttps://gist.github.com/4cea1b9ba9bf3081a065.git
This is a great package, thank you so much. I'm just wondering if there is a way to annotate the genome-wide significant probes/SNPs with specific data from another column in my dataframe (for example, closest gene or rsID)?
ReplyDeleteMany thanks
I think you could just pass in the other variable as a vector to highlight. e.g., qqman(..., highlight=mydata$mycol)
DeleteMany thanks for your quick reply. I've realised that I didn't actually explain my question very well. What I mean by 'annotate' is that I would like text on the manhattan plot. For example, for the highly significant probes/SNPs I would like text stating the gene or rsID. Thanks for your help.
DeleteAh, I see what you're saying. Unfortunately, no, that isn't implemented yet. But I'd recommend taking a look at textxy in the calibrate package - you could probably hack this into the manhattan source code somewhere that would work... http://www.inside-r.org/packages/cran/calibrate/docs/textxy
DeleteI'd recommend everyone go over and look at manhattanly - an R package based on qqman that allows for interactive manhattan plots. http://moderndata.plot.ly/manhattanly-r-package-for-interactive-manhattan-plots/
ReplyDeleteNice little package Stephen! Can't believe this is my first time doing these plots after so many bioinformatics years, but this made the intro easy.
ReplyDeleteHello Stephen I liked your package so much, but I found a problem when I use the manhattan(subset(gwasResults, CHR == 1)). I would like to now if when I am working with the subset of a CHR, I cant use the option to use another variable in the place of P-Value. Like in the
ReplyDeletemanhattan(gwasResults, p = "zscore", logp = FALSE, ylab = "Z-score", genomewideline = FALSE, suggestiveline = FALSE, main = "Manhattan plot of Z-scores").
Thanks in advance!