Monday, April 25, 2011

Annotated Manhattan plots and QQ plots for GWAS using R, Revisited

**** UPDATE, May 15 2014 *****
The functions described here have now been wrapped into an R package. View the updated blog post or see the online package vignette for how to install and use. If you'd still like to use the old code described here, you can access this at version 0.0.0 on GitHub. The code below likely won't work.
*****************************

Last year I showed you how to create manhattan plots, and later how to highlight regions of interest, using ggplot2 in R. The code was slow, required a lot of memory, and was difficult to maintain and modify.

I finally found time to rewrite the code using base graphics rather than ggplot2. The code is now much faster, and if you're familiar with base R's plot options and graphical parameters, most of these can now be passed to the functions to tweak the plots' appearance. The code also behaves differently depending on whether you have results for one or more than one chromosome.

Here's a quick demo.

First, either copy and paste the code from GitHub, or run the following commands in R to download and source the code from GitHub (you can't directly read from https in R, so you have to download the file first, the source it). Note the command is different on Mac vs Windows.

Download the function code on Mac:

download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r", method="curl")

Download the function code on Windows, leave out the method="curl" argument:

download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r")

Now, source the script containing the functions.

source("./qqman.r")


Next, load some GWAS results, and take a look at the relevant columns (same as above, download first then read locally from disk). This is standard output from PLINK's --assoc option.

Download example data on Mac:

download.file("https://raw.github.com/stephenturner/qqman/master/plink.assoc.txt.gz", destfile="./plink.assoc.txt.gz", method="curl")

Download example data on Windows:

download.file("https://raw.github.com/stephenturner/qqman/master/plink.assoc.txt.gz", destfile="./plink.assoc.txt.gz")

Read in the sample results, and take a look at the first few lines:

results = read.table(gzfile("./plink.assoc.txt.gz"), header=T)

head(subset(results, select=c(SNP, CHR, BP, P)))

The manhattan function assumes you have columns named SNP, CHR, BP, and P, corresponding to the SNP name (rs number), chromosome number, genomic coordinate, and p-value. Missing values (where regression failed to converge, for instance) should be recoded NA before reading into R. Do this with a quick sed command. Here's what the data looks like:

         SNP CHR        BP       P
1 rs10495434   1 235800006 0.62220
2  rs6689417   1  46100028 0.06195
3  rs3897197   1 143700035 0.10700
4  rs2282450   1 202300047 0.47280
5   rs567279   1  66400050      NA
6 rs11208515   1  64900051 0.53430


Now, create a basic manhattan plot (click the image to enlarge):

manhattan(results)



If you type args(manhattan) you can see the options you can set. Here are a few:

colors: this is a character vector specifying the colors to cycle through for coloring each point. Here's a PDF chart of R's color names.

ymax: this is the y-axis limit. If ymax="max" (default), the y-axis will always be a little bit higher than the most significant -log10(p-value). Otherwise you can set this value yourself.

cex.x.axis: this can be used to shrink the x-axis labels by setting this value less than 1. This is handy if some of the tick labels aren't showing up because the plot region is too small.

*** Update March 9 2012*** cex.x.axis is deprecated. To change the x-axis size, use the default base graphics argument cex.axis.

limitchromosomes: you can limit which chromosomes you want to display. By default this restricts the plot to chromosomes 1-23(x).

suggestiveline and genomewideline: set these to FALSE if you don't want threshold lines, or change the thresholds yourself.

annotate: by default this is undefined. If you supply a character vector of SNP names (e.g. rs numbers), any SNPs in the results data frame that also show up here will be highlighted in green by default. example below.

... : The dot-dot-dot means you can pass most other plot or graphical parameters to these functions (e.g. main, cex, pch, etc).

Make a better looking manhattan plot. Change the plot colors, point shape, and remove the threshold lines:

manhattan(results, colors=c("black","#666666","#CC6600"), pch=20, genomewideline=F, suggestiveline=F)


Now, read in a text file with SNP names that you want to highlight, then make a manhattan plot highlighting those SNPs, and give the plot a title:

Download a SNP list on Mac:

download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", destfile="./snps.txt", method="curl")

Download a SNP list on Windows:

download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", destfile="./snps.txt")

Read in the SNP list, and create an annotated plot:

snps_to_highlight = scan("./snps.txt", character())

manhattan(results, annotate=snps_to_highlight, pch=20, main="Manhattan Plot")



Finally, zoom in and plot only the results for chromosome 11, still highlighting those results. Notice that the x-axis changes from chromosome to genomic coordinate.

manhattan(subset(results, CHR==11), pch=20, annotate=snps_to_highlight, main="Chr11")



Finally, make a quantile-quantile plot of the p-values. To make a basic qq-plot of the p-values, pass the qq() function a vector of p-values:

qq(results$P)


Perhaps we should have made the qq-plot first, as it looks like we might have some unaccounted-for population stratification or other bias.

The code should run much faster and use less memory than before. All the old functions that use ggplot2 are still available, now prefixed with "gg." Please feel free to use, modify, and redistribute, but kindly link back to this post. The function source code, data, SNPs of interest, and example code used in this post are all available in the qqman GitHub repository.

203 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This is fantastic! Just what I was looking for. I stumbled across your earlier version this afternoon, and was having difficulty getting it to work. The new version worked beautifully and gave me the plot I was dreading having to code up myself from scratch. Could not have come a at a better time! Thanks!

    ReplyDelete
  3. Just wanted to add: great! Really, thanks a bunch. It works like a charm...
    I've added some code so it'll automatically outputs .EPS-files.

    Only thing I'm looking for now, is to annotate the top hits (-log10(P)>=8) with the gene name... That would be great!

    Thanks!

    ReplyDelete
  4. I have run it and this code is very cooool
    Also, a quick question, when I creat my own results and followed the format of “SNP CHR BP P”
    However, R returns
    “Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 2”

    Anybody knows why?
    Thanks

    ReplyDelete
    Replies
    1. I'm getting the same error message :/ I thought that maybe it was to do with my SNP names, as I'm working with another species, but even after renaming them to rsnumber I still get
      Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
      replacement has 0 rows, data has 7627

      Delete
    2. Joanna,

      This is a known issue. Do you "skip" a chromosome? I.e., have SNPs for chrs 1-3, 5-7, skipping 4, for example? I think this causes a problem. A solution is in the works. Stay tuned.

      Delete
    3. I also got the same problem. Probably error comes from old version of qqman.r -- I reloaded the latest version of qqman.r and problem got solved.
      Many many thanks for these codes. Great!

      Delete
    4. I'm also getting the issue, any idea how to fix this? I have the new version of qqman.r
      Error in `$<-.data.frame`(`*tmp*`, "pos", value = NA) :
      replacement has 1 row, data has 0

      Delete
    5. I also had this problem.

      > head(subset(results, select=c(SNP, CHR, BP, P)))
      SNP CHR BP P
      1 BTA-28471-no-rs 0 0 NA
      2 BTA-28495-no-rs 0 0 NA
      3 BTA-28466-no-rs 0 0 NA
      4 ARS-USMARC-Parent-DQ650635-rs29012174 0 0 NA
      5 ARS-USMARC-Parent-DQ451555-rs29010795 0 0 NA
      6 BPI-1 0 0 NA
      > manhattan(results)
      Error in `$<-.data.frame`(`*tmp*`, "pos", value = NA) :
      replacement has 1 row, data has 0


      For the P values, they are all NA. The assignment is due tomorrow morning and I've been banging my head over this all day.

      Delete
  5. I'm guessing the error might be because you have something other than a number in the p-value column (maybe a NaN?). This column can only be a number between 0 and 1 or NA for missing. Next time I update the code I'll add in some checks for this.

    ReplyDelete
    Replies
    1. Hi Stephen,
      Thanks for posting a very nice code. I am entirely new to R and GWAS. I ran this script and got this same error as above & below (my pvalues are in exponential forms. Please see below).

      > source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")
      > results <-read.table("LDLtopHits1.txt", T)
      > head(subset(results, select=c(SNP, CHR, BP, P)))
      SNP CHR BP P
      1 kgp11202561 19 45426792 2.200786e-48
      2 kgp7807118 19 45413576 1.488841e-35
      3 kgp8411531 19 45414399 2.522903e-24
      4 rs7528419 1 109817192 5.896030e-13
      5 rs1160985 19 45403412 1.611327e-12
      6 kgp807278 19 45389596 1.075362e-11
      > manhattan(results)
      Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
      replacement has 0 rows, data has 3
      Calls: manhattan -> $<- -> $<-.data.frame
      Execution halted

      Thanks
      Segun

      Delete
    2. I've double checked that I only have values between 0 and 1 in my p-value column and still get the same error...

      Delete
  6. Thank you!

    I started learning how to run GWAS two hours ago, and thanks to this I'm already looking at some preliminary results.

    Science has never been simpler! ;)

    ReplyDelete
  7. Thanks for the nice script! While using it I found out, that it gives an error when there are no snps for a specific Chromosome.

    ReplyDelete
  8. Many thanks! This is great!
    One thing though: when I try to use the "ymax" in order to limit the Y axis values, it doesn't really work. maybe I'm using it the wrong way.
    Here's what I typed:

    manhattan(results, ymax=5, limitchromosomes=F, colors=c("black","#666666","#CC6600"), pch=20, genomewideline=F, suggestiveline=F)

    Thanks!

    ReplyDelete
  9. Sagi, I think I see your problem here. These two lines gave it the behavior I desired:

    if (ymax=="max") ymax<-ceiling(max(d$logp))
    if (ymax<8) ymax <- 8

    ... which is to say, if ymax="max", make the maximum value on the y axis at least one point higher on the -log10p scale than the most significant SNP. But if the max y was less than 8, set it at 8 (because genome-wide significance is conventionally 5e-8).

    If you want this code to behave the way you want, remove the second line that resets ymax to 8 if it's less than that.

    ReplyDelete
    Replies
    1. Is it also possible to change the y-axis min? If i start the y-axis from 1 it will decrease the output and result in smaller pdf files. Thank you

      Delete
    2. Sure, just change the ylim=c(0,ymax) line to ylim=c(1,ymax) or whatever you want to start it at.

      Delete
    3. Thank you for your quick response!

      Do you need to do this twice in the script?
      Is it possible to do this via:

      manhattan(results, ylim=c(1,ymax))?

      Somehow i don't get it working, right.

      Thank you,
      Jonas

      Delete
    4. Make sure to make the change on line 59, on the points() call in the else {} block

      Delete
    5. This comment has been removed by the author.

      Delete
  10. Hi,

    As many wrote: thanks a bunch! I'm having difficulty getting the genome-wide significance at 5e-8. It just won't get to 8, it's always a little off... Any clues?
    I changed your code a little, but only in making the line for instance dotted. Here's the code:

    # manhattan plot using base graphics
    manhattan = function(dataframe, colors=c("black", "gray50"), ymax="max", cex.x.axis=0.8, limitchromosomes=1:23, suggestiveline=-log10(1e-4), genomewideline=-log10(5e-8), annotate=NULL, ...) {

    d=dataframe
    if (!("CHR" %in% names(d) & "BP" %in% names(d) & "P" %in% names(d))) stop("Make sure your data frame contains columns CHR, BP, and P")

    if (any(limitchromosomes)) d=d[d$CHR %in% limitchromosomes, ]
    d=subset(na.omit(d[order(d$CHR, d$BP), ]), (P>0 & P<=1)) # remove na's, sort, and keep only 0<P<=1
    d$logp = -log10(d$P)
    d$pos=NA
    ticks=NULL
    lastbase=0
    colors <- rep(colors,max(d$CHR))[1:max(d$CHR)]
    if (ymax=="max") ymax<-ceiling(max(d$logp))
    if (ymax<9) ymax<-9

    numchroms=length(unique(d$CHR))
    if (numchroms==1) {
    d$pos=d$BP
    ticks=floor(length(d$pos))/2+1
    } else {
    for (i in unique(d$CHR)) {
    if (i==1) {
    d[d$CHR==i, ]$pos=d[d$CHR==i, ]$BP
    } else {
    lastbase=lastbase+tail(subset(d,CHR==i-1)$BP, 1)
    d[d$CHR==i, ]$pos=d[d$CHR==i, ]$BP+lastbase
    }
    ticks=c(ticks, d[d$CHR==i, ]$pos[floor(length(d[d$CHR==i, ]$pos)/2)+1])
    }
    }

    if (numchroms==1) {
    with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(P))), xlab=paste("Chromosome",unique(d$CHR),"position"), ...))
    } else {
    with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(P))), xlab="Chromosome", xaxt="n", type="n", ...))
    axis(1, at=ticks, lab=unique(d$CHR), cex.axis=cex.x.axis)
    icol=1
    for (i in unique(d$CHR)) {
    with(d[d$CHR==i, ],points(pos, logp, col=colors[icol], ...))
    icol=icol+1
    }
    }

    if (!is.null(annotate)) {
    d.annotate=d[which(d$SNP %in% annotate), ]
    with(d.annotate, points(pos, logp, col="darkorange", ...))
    }

    if (suggestiveline) abline(h=suggestiveline, col="royalblue4", lty=5)
    if (genomewideline) abline(h=genomewideline, col="red",lty=5)
    }

    Thanks!

    ReplyDelete
  11. @swvanderlaan - the reason the line won't get to 8 on the y-axis is because genomewideline is set to -log10(5e-8), which is about 7.30 on the -log10 scale. The p=5e-8 number comes from the notion that there are ~1 million independent tests in the genome, so by a Bonferroni correction, 0.05/1e6=5e-8, the widely accepted holy grail p-value for "genome-wide significance."

    If you want the line to show up at 8 on the y-axis, set genomewideline=-log10(1e-8), or simply, genomewideline=8.

    Hope this helps.

    ReplyDelete
  12. Hey Stephen,

    So, you stare for minutes, hours at the code, and you just don't see it... Haha, thanks!

    Now, what I wrote earlier: I want names with my top hits. And I'll figure that one out, I'm sure. And when I do, I'll be sure to get back to this post, to share it!

    Thanks.

    Sander

    ReplyDelete
  13. Try using the text() function to add text to the plot. It probably wouldn't be too hard to work this into the function to annotate SNPs with gene names using the text() function. You'll probably want to limit it to SNPs having annotation information AND having p-value less than a certain threshold, so that it's readable and looks nice.

    ReplyDelete
  14. Does Dr. swvanderlaan or Dr. Stephen Turner can help to solve how to make top 10 SNPs name on the plots with R?

    ReplyDelete
  15. Can anyone tell me how to adjust the size of the points on the plot?

    ReplyDelete
  16. Use the cex= option. See here: http://www.inside-r.org/r-doc/graphics/par

    ReplyDelete
  17. hmm...Read through that but didn't understand a whole lot. par(pch=20, cex=0.5) kind of does what I want but it also resizes the axis label fonts. One would think there would be something like pch.cex to do this... Can you give me any clearer guidance please?

    ReplyDelete
  18. Oh, I see what's going on here now. I included a graphical parameter that I named cex.x.axis to resize x-axis labels. When using the cex= argument, it conflicts with the argument I supply. Source this version, and you should be able to change the point size with the cex option.

    manhattan(results) #normal size
    manhattan(results, cex=.5) #half size

    https://gist.github.com/1094160

    ReplyDelete
  19. Thanks Stephen that worked.

    ReplyDelete
  20. Hi Stephen,

    I ran this script and got the following error message :
    "Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 36720"

    a similiar problem somebody had.

    I double-checked the P values, all fall in (0,1] and no missing values.

    Do you know how this would happen? anything is wrong in my dataset?

    Many thanks,

    ReplyDelete
  21. Hmm, not sure without seeing your data. Are you sure that all the other relevant columns contain numeric values only? How many chromosomes do you have?

    ReplyDelete
  22. Hi Stephen,

    I'm trying to run this script on some results from mice, and am getting this error: Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :max not meaningful for factors.
    I thought this may be due to the choromosome number being different, but I have now changed the script to chromosome 1:19 and it still has the same error. Do you know what might be causing this problem? Thanks.

    ReplyDelete
  23. Hi Stephen,

    I posted a problem a few days ago when I ran this script and the error message was:
    :"Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 36720"

    I double-checked the varaibles in the dataset and all the other relevant columns contain numeric value. However, there was only 21 chromosomes (chromomsome 3 was missing). I am wondering if this is the reason which caused the problem?

    Thanks,

    ReplyDelete
  24. Anonymous 1 and anonymous 2: I'll try to take a look at the code sometime soon but I've got a major grant deadline coming up that's sucking much of my time away. Anonymous 2: That could easily be the problem - for a dirty fix, try putting in a single line in your results file between chr 2 and 4, where the chr is 3, position is 1, and p-value is 1, see if it works and let me know. Anonymous 1 - it's difficult for me to troubleshoot without some example data. Could you use something like http://jotonce.com/ to copy a few lines out of your results file, and email me the password to access it?

    ReplyDelete
  25. Stephen,

    after adding a redundant line for chr 3, the function works well! thank you so much!

    ReplyDelete
  26. Thanks for helping me figure that one out. I'm soon going to be updating the code to fix a few other things and add things like automatic annotation based on gene region, SNP features, etc. I'll add this to the list of things to fix.

    ReplyDelete
  27. Hey Stephen,

    I'm running into the following error when trying to create a manhattan plot:
    Error in Math.factor(d$P) : log10 not meaningful for factors In addition: Warning messages: 1: In Ops.factor(P, 0) : > not meaningful for factors 2: In Ops.factor(P, 1) : <= not meaningful for factors

    Any insight into what might be the source of this error would be greatly appreciated as I'm stumped!

    ReplyDelete
  28. Do you have something other than numbers between 0 and 1 (not including 0)?

    ReplyDelete
  29. Thank you for sharing the code! I need to prepare one of these plots quickly and your programming is pretty straightforward. Nice work!

    ReplyDelete
  30. Is there a way to get a chromosome 0 for scaffolds not incorporated into CHR?

    ReplyDelete
  31. I'll add it to the growing list of changes for the next update. For now, what happens if you simply have a chromosome 0, or chromosome 24? Make sure you appropriately change the limitchromosomes argument.

    ReplyDelete
  32. If I change the limit chromosome argument to 0:8, and include CHR 0 in my source file I get this error:

    Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 213

    I get the same error if I set limitchromosomes=F

    I have not tried it with 24 as my organism only has a few chromosomes.

    ReplyDelete
  33. Also, if I leave it as 1:8 it plots perfectly, but just excludes the CHR 0 data.

    ReplyDelete
  34. Changing this i==1 to i ==0 allows for CHR 0,but for some reason the last chromosome has not points.

    + } else {
    + for (i in unique(d$CHR)) {
    + if (i==0) {

    ReplyDelete
  35. and changing icol=icol+0 restores those points for the last CHR, but the colors by CHR are missing

    + for (i in unique(d$CHR)) {
    + with(d[d$CHR==i, ],points(pos, logp, col=colors[icol], ...))
    + icol=icol+0

    ReplyDelete
  36. Finally got it with changing from i==1 to i==0 like so:

    + } else {
    + for (i in unique(d$CHR)) {
    + if (i==0) {

    and then changing type= "n" to type = "p"


    + with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(p))), xlab="Chromosome", xaxt="n", type="p", ...))

    ReplyDelete
  37. Excellent! Thanks for hammering away at this. I'll try to make the code more flexible on my next round of updates.

    ReplyDelete
  38. Hi Stephen,

    Thanks. This is great !

    I have tried to add the SNP name in the plot using text. Using the BP for that SNP, how can I find the x-coordinate in the plot? For example, what is x-coordinate in the plot for CHR=2, BP=53606599)?

    Thanks in advance.

    ReplyDelete
  39. If you look at the code here, in lines 45-53 I'm creating a new variable in the data frame called d$pos. On chromosome 1 d$pos is set to d$BP. For every remaining chromosome, d$pos is set to the BP position on that chromosome, added to the last base position on the previous chromosome. For instance, if I have 3 SNPs on chromosome 1 at positions 10, 20, and 30, and 3 SNPs on chromosome 2 at positions 5, 15, 25, the X coordinates for the six SNPs would be 10, 20, 30, 35, 45, 55. The x-axis uses the d$pos variable, and you can mod the code to use d$pos as the x coordinate for your text. I'd love to see what you do with this, please post the code to https://gist.github.com/. Thanks!

    ReplyDelete
  40. We were just using plink the other day and I wanted to create a Manhattan plot for our data. I am running into this error:
    Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 23

    My plink.assoc file is from a small dataset so it only contains two CHR. I've searched the P column for non-0 or 1 but didn't find any.

    Thanks for writing this code! I appreciate the time and help!

    I can e-mail the the plink.assoc file if you need it.

    ara

    ReplyDelete
  41. I actually figured it out using the args(manhattan). When you have less then 23 CHR you need to use the limitchromosomes=#,# . Where #s are your CHRs that are present in the file.

    ReplyDelete
  42. Well I thought I had it solved but...

    So I have two CHR 11 and 17. I use the limitchromosomes to get it to plot 11 but how to let R know I want just 11 and 17?

    ReplyDelete
  43. Thanks for figuring this out. As with the many other little inconsistencies, I'll try to iron this out and make the code a little more robust in the next version.

    ReplyDelete
  44. Just another quick note. I tried to use the limitchromosomes=11:12 (basically turned my 17 into 12 so it's consecutive) the same error pops up. It looks like you can do a single CHR using limitchromosomes but not consecutive CHRs unless you have all 1:23 present.

    ReplyDelete
  45. Is there any way to change the colour of the annotation from green to red? I don't quite understand how the dot-dot-dot function is supposed to work?

    ReplyDelete
  46. Hi Stephen,

    Just a quick question: how do you usually select you annotation SNPs? Do you say (using PLINK): pick a window of 100 SNPs around my hit? Or do make it broader?

    Thanks!

    Sander

    ReplyDelete
  47. Where I've needed this in the past is to highlight known hits based on RS number from the GWAS catalog or something. If you wanted to highlight "skyscrapers" in the manhattan plot, you could just select all the SNPs within a certain window like you said.

    ReplyDelete
  48. Exactly, but what range would you take 100-250kb, or would rather take 500-1500kb?

    ReplyDelete
  49. It's hard to say really. You might base it on how big the haplotype blocks are in that region. But if you're just coloring neighboring SNPs to highlight the peak on the plot, you probably just want to eyeball it, and pick whatever size looks good.

    ReplyDelete
  50. It works great! Thanks a lot!!

    I also encountered that error when there are no snps for a specific Chromosome. I solved it with a minor adjustment to the loop on line 45. Below is the code.

    CHRs <- unique(d$CHR)
    numchroms=length(CHRs)
    if (numchroms==1) {
    d$pos=d$BP
    ticks=floor(length(d$pos))/2+1
    } else {
    for (i in 1:numchroms) {
    if (i==1) {
    d[d$CHR==CHRs[i], ]$pos=d[d$CHR==CHRs[i], ]$BP
    } else {
    lastbase=lastbase+tail(subset(d,CHR==CHRs[i-1])$BP, 1)
    d[d$CHR==CHRs[i], ]$pos=d[d$CHR==CHRs[i], ]$BP+lastbase
    }
    ticks=c(ticks, d[d$CHR==CHRs[i], ]$pos[floor(length(d[d$CHR==CHRs[i], ]$pos)/2)+1])
    }
    }

    ReplyDelete
  51. Bahar - thanks for sharing your solution. I think others have had this problem before but I never knew what was causing it.

    ReplyDelete
  52. This is great. Thanks a lot Dr. Turner.
    I was trying to figure it how to Zoom in a specific location of Chromosome. For example, if I want to have Manhattan plot only for BP "1e+07 to 2e+07" of CHR1. Any suggestions?

    ReplyDelete
  53. Hi Stephen,

    Thanks for your code. It's awesome. I did a MH plot using your code. In the x-axis, all the chr # didn't show up. How do I resolve this issue?
    All the #s came up between 1 and 9 and then I have 11,13, 15, 19 etc.

    Thanks,

    -Joey

    ReplyDelete
  54. Joey - make a bigger plog. Save the plot to a file and make it wider.

    ReplyDelete
  55. Hi Stephen,

    This function is really helpful. I love reading your blog, really very informative, for people like me who have to learn statistics to be in field of human genetics.

    I used the function to plot Manhattan but, I am having trouble with pch, dots are very big and when I plot 6million SNP's with imputed data big dots merge with each other and it looks messy. I tried to change size with cex, but it changed the size of x axis and dots remained of same size. Can you suggest something to deal with this?

    Thanks,

    ReplyDelete
  56. Ah, I see what's going on here. I named the argument "cex.x.axis" so I could control this at line 60 in the code. Copy the function, and rename the "cex.x.axis" to something like xcex, and when you pass in the cec= argument, it should work properly to resize the points, not the x-axis labels.

    ReplyDelete
  57. Great codes! Very useful. Thank you Stephen

    ReplyDelete
  58. Hi Stephen,

    This code is so useful - thank you for sharing it!

    I have been adapting your script, but I'm having some problems with adding a title to the Manhattan plot (although I can successfully change the colours):

    # manhattan plot using base graphics
    manhattan = function(dataframe, colors=c("#8A2BE2", "#5F9EA0", "#483D8B" ), ymax="max", cex.x.axis=1, limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL,
    main="Viral load in cell lines", ...) {

    If you could suggest what I should do to add a title, that would be much appreciated!

    ReplyDelete
  59. I'm also having trouble changing the size of the points using pch:

    # manhattan plot using base graphics
    manhattan = function(dataframe, colors=c("#8A2BE2", "#5F9EA0", "#483D8B" ), ymax="max", cex.x.axis=1, limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL,
    main="Viral load in cell lines", pch=20, ...) {

    ReplyDelete
  60. Hello Anonymous from July 19 - I believe I solved your cex problem.

    ReplyDelete
  61. Hi Stephen,

    Awesome work! I am using R studio to run the code. I am having trouble getting the last command to execute. manhattan(results) gives an Error: could not find function "manhattan".

    Any ideas?

    Thanks

    ReplyDelete
  62. It looks like you didn't source in the function. Copy and paste the function yourself, or source it from the web directly using source("http://www.StephenTurner.us/qqman.r")

    ReplyDelete
  63. Hello Stephen,
    Its a very nice piece of code you have compiled. I am using R to plot my GWAS data. I was wondering if we could plot lines instead of dots with colour intensity proportional to LOD scores. In that way, it'll be easy to draw region specific conclusion about association.

    Regards,
    Amol

    Best Wishes

    ReplyDelete
  64. Amol - that would be pretty simple with the old ggplot2 code - just use geom="line" and colour=LOD or the like. I'm not sure how to build this into the current implementation, but contributions are welcome.

    ReplyDelete
  65. Hi Stephen,

    I want to have grids at beginning and end of each chromosome on X-axis in my Manhattan plot. Can you tell how should I specify that?

    Thanks,

    Manav

    ReplyDelete
  66. This should be easy to do with the abline function.

    ReplyDelete
  67. Hi Stephen,

    Great code! Thank you so much!

    I am trying to replicate the manhattan plot in this paper, http://www.pnas.org/content/107/39/16910.full.pdf+html?with-ds=yes, (Fig.S8) with the data provided, http://heim.ifi.uio.no/bioinf/Projects/ASCAT/Version1/AllelicSkewness.txt.

    I had the same error as some described above:
    Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 401

    So, I tried to input the data bit by bit to determine where the error is.

    I located the first error in this line in the data:
    rs12526698 6 2610972 4 2 4 2 2 6 12 8 0.6 0.503444672

    at Chromosome 6 with p-value 0.503444672. When I changed this p-value to NA it gave me this message:

    Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 400

    indicating that a p-value other than NA would result in an error. (Notice the change from 401 to 400). Don't know why it is only accepting "NA".

    There are 23 chromosomes by the way.

    Thank you,
    Miah

    ReplyDelete
  68. Hi Stephen,

    Great code! Thank you so much!

    I am trying to replicate the manhattan plot in this paper, http://www.pnas.org/content/107/39/16910.full.pdf+html?with-ds=yes, (Fig.S8) with the data provided, http://heim.ifi.uio.no/bioinf/Projects/ASCAT/Version1/AllelicSkewness.txt.

    I had the same error as some described above:
    Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 401

    So, I put in the data bit by bit to determine where the error is.

    I located the first error in this line in the data:
    rs12526698 6 2610972 4 2 4 2 2 6 12 8 0.6 0.503444672

    at Chromosome 6 with p-value 0.503444672. When I changed this p-value to NA it gave me this message:

    Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 400

    indicating that a p-value other than NA would result in an error. (Notice the change from 401 to 400)

    There are 23 chromosomes by the way.

    Thank you,
    Miah

    ReplyDelete
  69. I'm guessing you changed the column headers? The manhattan function assumes you have columns named SNP, CHR, BP, and P, corresponding to the SNP name (rs number), chromosome number, genomic coordinate, and p-value. Missing values (where regression failed to converge, for instance) should be recoded NA before reading into R.

    I suppose I should add in some error checking but it's been a while since I've looked at this code. Hopefully I can update it soon. Let me know if this helps.

    Does it work otherwise if you put say all the data from chromosomes 1-5 in there? If so, something's odd with the data at that point on chr 6.

    ReplyDelete
  70. Yes, I did change the corresponding headers. It does work when I put all the data from chromosome 1-5 until that point on chromosome 6. When I changed the p value at that point to NA, it worked fine.

    Thanks,

    Miah

    ReplyDelete
  71. i have used "source("http://www.StephenTurner.us/qqman.r")" server several times before, but suddenly it's not working for last few days ! can you please suggest some way ?

    Best,
    Sabyasachi

    ReplyDelete
    Replies
    1. Sabyasachi - I had these files hosted under my google sites account, but it looks like they changed something with how they handle redirects. I moved the files to my server at UVA and updated the code. You should be good to go with the updated code above.

      Delete
    2. Oh, and you can always download the code from GitHub.

      Delete
  72. Hi Stephen, thanks for this amazing code!
    I am just wondering if I can change the label for chr X ("X" instead of "23").

    Is it possible?

    Thank you!

    Diego

    ReplyDelete
  73. Diego - the quick and dirty solution would be to change line 60 of the code where it says lab=unique(d$CHR), change that to lab=c(1:22,"X").

    ReplyDelete
  74. Stephen:

    I have used the code successfully in the past. I am now obtaining the error code below. Do you have any guidance on this issue. I have attached by data.

    Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
    replacement has 0 rows, data has 1


    CHR SNP BP A1 TEST NMISS OR SE L95 U95 STAT P
    1 rs3813605 84743535 A DOM 2200 1.109 0.09787 0.9154 1.343 1.056 0.2908
    2 rs4953223 45370846 C DOM 2201 1.323 0.09777 1.092 1.602 2.862 0.004208
    2 rs34058885 49417012 A DOM 2200 0.9881 0.09331 0.8229 1.186 -0.1288 0.8975
    2 rs7599054 135257016 G DOM 2201 0.7543 0.09637 0.6245 0.9111 -2.926 0.003435
    3 rs3896023 173034387 A DOM 2197 0.883 0.09549 0.7323 1.065 -1.303 0.1925
    4 rs1554668 114576711 T DOM 2199 1.28 0.1163 1.019 1.608 2.125 0.03362
    5 rs12520934 67162619 A DOM 2199 1.38 0.09121 1.154 1.65 3.528 0.0004186
    6 rs9469084 32188361 T DOM 2201 0.591 0.1319 0.4564 0.7653 -3.988 6.672e-005
    6 rs1123969 155276341 A DOM 2201 1.299 0.09375 1.081 1.561 2.789 0.005294
    8 rs1016646 20307688 A DOM 2200 0.7924 0.1284 0.6161 1.019 -1.811 0.07009
    10 rs10788275 124051491 T DOM 2200 1.376 0.09063 1.152 1.643 3.521 0.0004295
    11 rs733454 76155369 A DOM 2200 1.335 0.1226 1.05 1.698 2.358 0.01838
    20 rs2865873 38808983 T DOM 2182 1.28 0.09351 1.066 1.537 2.64 0.008289

    ReplyDelete
  75. John Paul,

    This is a known issue - when you skip over chromosomes you run into this problem. It will work fine on the first 9 rows of your data. I haven't had time to fix this. I'd welcome any contributions!

    Best,

    Stephen

    ReplyDelete
    Replies
    1. Stephen:

      Thanks for your response on this matter. Unfortunately I don't have the expertise to help with the program. With the information you've provided we can move ahead to generate the figure. Thanks for making the product available -- it has helped us a great deal.

      Sincerely,

      John Paul

      Delete
    2. I have a couple of suggestions to work around this. One would require a change to the code the other would not.

      1) I assume that in the provided code, the min and max values for the SNP positions are used as the length of the associated chromosome and that no SNP data for a particular chromosome would mean no min and max and thus generate an error. It might be a good idea to allow the user to provide a coordinates file as an input option to the function that details the expected lengths of each chromosome to plot. Then these more accurate widths can be used in place of the min and max values.

      2) If you don't have data to plot for a particular chromosome but do for subsequent chromosomes and wanted to avoid this error you could create two fake entries in your input file each defining a fake SNP call and other required data for the start of the chromosome and one for the expected end. Then you could use the annotate option to annotate this SNPs in white. They would be plotted, but you wouldn't see them and the correct width would be maintained in the figure.

      Not that I've not tested this out myself but logically it should work. Maybe the author could comment?
      Cheers!

      Delete
    3. Anon - Thanks for the suggestion. I've known about this error for some time but have not had the time to make the necessary code updates. Hopefully will soon. Thanks for leaving these tips here.

      Delete
  76. Hello,

    I am using the manhattan() function to plot the results of THREE different analyses for the same subset of 20 snps (ie 3 sets of p-values for the 20 snps). The function seems to have no problem plotting multiple SNPs at the same position. However, I would like to attribute a different color to each set of P-values. Would this be possible with the manhattan() function? I've tried including a fourth column ("COL") with the colors - ie

    SNP CHR POS P COL
    rs1 8 123 0.05 green
    ...

    and calling it with

    manhattan(results,col=results$COL)

    but the colors do not come up.

    Any suggestions?

    Thank you.

    ReplyDelete
  77. Hello.

    I think it is a good program to make manhattan in r. But i had a problem
    When I made manhattan plot in cattle(chromosome = 29) using your manhattan function, It had a one problem. I can not see whole chromesome number in x-axis
    (for example, x-axis: 1,2,3,4,5,6,7,---, 23,25,27,28,29)
    How to change font size on x-axis about numeric value(here, chromosome number)

    Thank you.

    ReplyDelete
  78. Stephen,

    Thanks for the great code! How can I specify the size of the plot? I'd like to reduce the height and make it wider.

    ReplyDelete
    Replies
    1. png("filename.png", width=2000, height=500); manhattan(...); dev.off()

      Delete
  79. I couldn't figure out how to run the QQ plot without the HLA region MB25-MB35 on chromosome 6. Any suggestions? Thanks!

    ReplyDelete
    Replies
    1. Use R's subset() command to get rid of those regions before running.

      Delete
  80. Dear Stephen,

    On the QQ plot, I'm trying to adjust the point size, and the maximum on the x- and y-axes using:

    > myQQ<-qq(mydata$P, pch=20, xmax=10, ymax=10)

    However R doesn't like this.

    1: In plot.window(...) : "xmax" is not a graphical parameter
    2: In plot.window(...) : "ymax" is not a graphical parameter
    Error in localWindow(xlim, ylim, log, asp, ...) :
    formal argument "pch" matched by multiple actual arguments

    I might be missing something fundamental to R as I haven't used it much (yet...!)

    Cheers,

    Blue

    P.S. Thanks for posting the scripts and hope the grant application went well.

    ReplyDelete
    Replies
    1. Because you're trying to use the qq() function, not the manhattan() function. ymax and xmax are specific to manhattan() only.

      Delete
  81. Hi Stephen,

    First of all: Thanks for the useful code!

    I've used it several times before without problems but now qq(results) comes up with the "P value vector is not numeric" error. Worryingly, the problem persists even when I use your example dataset, whether it's using the R desktop interface or R in a unix environment.

    manhattan(results) works fine, but gg.qq(results) also apppears broken, with the following error: "Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) : undefined columns selected"

    Do you have any idea what this might be about?

    Thanks,
    Laura

    ReplyDelete
  82. You actually need to provide the P-value vector. I.e., qq(results$P). Just tried this using my example dataset.

    ReplyDelete
  83. D'oh. Sorry, I'm an idiot. Thanks..

    ReplyDelete
  84. Guys THIS does not map for X, MT Y DNA
    So, you can convert Cromossome X to a 23 , and y yo 24 and MT to 25

    For that, on the manhattan function, put on the line after

    d=data...

    paste this:

    d$CHR[d$CHR=="X"] <- "23"
    d$CHR[d$CHR=="Y"] <- "24"
    d$CHR[d$CHR=="MT"] <- "25"

    Then all your problems will be solved.

    Also , there is an erro i think in the line , lastbase=lastbase+tail(subset(d,CHR==CHRs[i-1])$BP, 1)

    the i-1 is not correct, since it outputs erro in my data, you can put just
    lastbase=lastbase+tail(subset(d,CHR==CHRs[i])$BP, 1) ?

    Try out, see if this changes Works good!

    ReplyDelete
  85. I believe I used the CHRs[i-1] bit to get the last base from the *previous*, i.e., i-1'th chromosome, so as to start numbering from there on the x-axis for the next chromsome.

    Thanks for providing code for X, Y, and MT. I didn't want to make assumptions about how users were naming these chromosomes. I.e., some name them 23, 24, 25 as you did. Others have PAR regions from Y, not sure how they're naming those.

    ReplyDelete
  86. To annotate the rs number for pvalues >0.001 if you can use this:

    for (i in 1:length(d$P)){
    if(d[i,"logp"]> -log10(0.001)){
    text(d[i,"pos"],d[i,"logp"],d[i,"SNP"],pos=3)
    }}

    before the line " if (!is.null(annotate)) { "

    ReplyDelete
  87. The dataframe does not accept X, Y, MT strings characters as a numerical, so its easy to see it wont work, and will give error, when plotting the value "X on the axis x". So thats the only aproach i see right now.

    About the i-1 you probably right, but somehow, it gives me error.. i have to check again.

    ReplyDelete
  88. About the last chromossome, the function is not very ... well contructed, because it assumes your data as the chromossomes from 1 to 23 all of them.
    And what happens is if you have chromossome 7 and then only the 9..
    the function as it is, will call chrom[i] where i is the 9 and then you make i-1 which is 8, then on the list there is no chromossome 8, so it gives error, thats the problem i found. So i have to rewritte the code, and now its working like i want.

    I had to call a new list, with the unique chromossomes, then use that array as reference for the i , so then it will call the correct index-1 chromossome on the list of "chromossomes that exist only".

    So this is the code:

    for the whole manhattan function. Although the axis has some weird bars that dont get named, dont know if thats normal.


    # manhattan plot using base graphics
    manhattan <- function(dataframe, colors=c("#0174DF", "#DF013A"), ymax="max", limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL, ...) {

    d=dataframe

    d$CHR[d$CHR=="X"] <- "23"
    d$CHR[d$CHR=="Y"] <- "24"
    d$CHR[d$CHR=="MT"] <- "25"
    d$CHR <- sapply(d$CHR, as.numeric)

    if (!("CHR" %in% names(d) & "BP" %in% names(d) & "P" %in% names(d))) stop("Make sure your data frame contains columns CHR, BP, and P")
    if (any(limitchromosomes)) d=d[d$CHR %in% limitchromosomes, ]

    d=subset(na.omit(d[order(d$CHR, d$BP), ]), (P>0 & P<=1)) # remove na's, sort, and keep only 0 -log10(0.001)){
    text(d[i,"pos"],d[i,"logp"],paste(d[i,"SNP"],",",d[i,"Allele"]),pos=2,cex=0.6)
    }
    }

    if (!is.null(annotate)) {
    d.annotate=d[which(d$SNP %in% annotate), ]
    with(d.annotate, points(pos, logp, col="green3", ...))
    }

    if (suggestiveline) abline(h=suggestiveline, col="blue")
    if (genomewideline) abline(h=genomewideline, col="red")
    }

    ReplyDelete
  89. This comment has been removed by the author.

    ReplyDelete
  90. Here is the google Drive Link to the code:

    Somehow posting the whole code here, removes some parts (some protection from blogger perhaps).


    Link: https://docs.google.com/document/pub?id=1YM8HXVZ_gBRdKe3Yp5aVcA6EI6z428adHgT8NXd7nDA

    ReplyDelete
  91. MQ - thanks for updating this. FYI you can easily fork my code on GitHub, which will syntax-highlight if you name the Gist with a .R extension.

    ReplyDelete
  92. Dear Stephen,
    I start using R recently, and I need to plot some chromosome wise values in manhattan plot.
    but my values are greater-than 1 (Max 11, Min 0). But I am confused how to change in the R script, you provided for manhattan plot. Can you please help me, I'll be very thankful to you.

    ReplyDelete
  93. Dear Stephen,
    I really love your script, however I cannot get it to work. That is probably because I have five chromosomes called 2L, 2R, 3L, 3R and X. Is there a way to get around this?
    Best, Palle

    ReplyDelete
  94. Hello,
    I tried to run the function "manhattan" but I got this error:
    Error: could not find function "manhattan".
    Which package has this function?
    I couldn't find it in the ggplot2.
    Can anyone help?
    Maybe it's because I am using a newer version of R?
    Thanks,
    Einat

    ReplyDelete
    Replies
    1. Looks like you didn't define the functions. You can copy and paste the code from the github gist, or source the function in from here: source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")

      Delete
  95. hello stephen. i have a (hopefully simple) question. can you explain how i can make sure my manhattan plot y-axis is always high enough to show the suggestive and genomewide lines?

    thanks in advance, and happy new year!

    ReplyDelete
  96. I'm trying to plot a manhattan plot for 431885 SNPs. I 've already excluded chromossomes X, Y, MT and 0, and the following error appears:

    Error in Math.factor(d$P) : log10 not meaningful for factors
    In addition: Warning messages:
    1: In Ops.factor(P, 0) : > not meaningful for factors
    2: In Ops.factor(P, 1) : <= not meaningful for factors

    What do I have to change on the original script? Is it possible to plot only with data from chromosomes and p-value?
    Thank you.

    ReplyDelete
  97. Hi Stephen,

    I wonder if you can help.

    The simple file below (which has lost its formatting here but was tab delimited) fails when producing a manhattan plot but remove the chr 21 line and all is well. I can't for the life of me work out why.

    The error message I'm getting is Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) : and in fact changing the 21 to a 19 makes it all work again.

    The code I was using is straight from your demo:
    source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")
    results <- read.table("file", header=T, sep="\t")
    manhattan(results, suggestiveline=F, genomewideline=F, pch=20, main="Manhattan Plot")

    Do you have any suggestions?

    CHR SNP BP P
    19 LbrM.19.embl_731631 731631 0.934
    19 LbrM.19.embl_731734 731734 0.246
    19 LbrM.19.embl_731758 731758 0.098
    21 LbrM.21.embl_3445 3445 0.277
    19 LbrM.19.embl_731898 731898 0.534

    ReplyDelete
    Replies
    1. In case anyone is also struggling with this and like me didn't see the earlier post about this same issue (oops). This error can appear when there is no data present for a particular chromosome. In my case it was chr20.

      I've replied above to make a suggestion to prevent this error.
      Cheers!

      Delete
  98. Hi!
    I can't run the function 'annotate' and I don't have a clue... Could you help me?

    ReplyDelete
  99. Hello,

    Thank you very much for your great code.

    I am having a problem, I am using this code for cattle dataset with 29 chromosomes. I have changed the limitchromosomes=1:29 but when I plot it, a gap appears between chromosomes 23 and 24, it would be great if you could help me with that.

    Cheers!

    ReplyDelete
  100. Thanks for sharing the code! Any advice on how to add a shaded region around the abline corresponding to 95% confidence intervals?

    ReplyDelete
  101. Does anyone here have the time or interest to help me go through all these bugs and feature requests and clean up the code, perhaps even create an R package?

    ReplyDelete
  102. a question for more than 23 chromosomes plotting how I can do since the genomoma handling of cattle that is 30.

    greetings.

    ReplyDelete
  103. When I tried to open the output Rplots.pdf from the manhattan function, it said that the file is damaged and cannot be repaired. I thought it's because it's too big (600MB!) and tried to convert it to PNG, but it still tells me that the pdf is damaged. I tried to rerun it several times and also ran the qqplot, yet it never worked. Any insight?

    Thanks!

    ReplyDelete
  104. Hello, I would like to print my SNP names in the Axis X and not the BP. How can I do it?
    Thanks!
    Julie

    ReplyDelete
  105. Just want to say thanks, really helpful

    ReplyDelete
  106. I am having problems annotating with SNPs I want to highlight. My SNPs are in both files (all the ones to plot first and then the chosen SNPs to highlight in green) characters (not a string of characters). I get no error message when I make the manhattan function plot with my annotation data.frame but I also get no green dots.. Anyone know how to solve this? Thanks!

    ReplyDelete
  107. Hi Stephan,

    I'm newbie here. I just start learning R a month ago.
    I was trying to make manhattan plot with 32 chromosomes and the last chromosomes is chromosome X. I tried to change your script to chromosome 1:32 and it gave an error like this. "Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :max not meaningful for factors."

    I saw there was someone on August 15, 2011 at 5:08 AM also having the same problem with me, but I couldn't find the solution *or maybe I missed it*

    Can you help me to solve this problem? Thanks!

    ReplyDelete
  108. I wrote yesteray about annotating, I have now fixed it by transposing the list of SNPs to highligt. AO. Herlino: I think you have to make sure that the column with CHR is numeric.

    ReplyDelete
    Replies
    1. Hi Anonymous,

      Thanks for the reply! I was sure CHR column is numeric, I don't know what happen actually with this? I just change the script from limitation of 1:23 into 1:32, but it didn't work. Do you have any idea with it?

      Thanks!

      Delete
  109. Hello Stephen,
    great code! I have found it very helpful in my continuing quest to "get genetics done" I am slowly getting to the point where I have somewhat of an understanding of what the code is doing. I would like to create a plot in which the SNP surrounding a particular lead SNP are color coded by their LD with that SNP. My initial thought was to add an additional variable to the annotation file and then use that on line 70 in place of green3. Am I even on the right track here?

    Thanks

    Darryl

    ReplyDelete
    Replies
    1. Hi Darryl. My strategy would be to look at the snps in the snps annotate file and get their positions, then add more snps to that file extending out by a certain distance, a kb for example. Other ways to do this. I hope to get back to this code at some point to clean up all the bugs listed above, and add this as a feature. Stay tuned.

      Delete
  110. > snps_to_highlight <- scan("http://www.StephenTurner.us/snps.txt", character())


    Error in file(file, "r") : cannot open the connection
    In addition: Warning message:
    In file(file, "r") : cannot open: HTTP status was '404 Not Found'

    Is anyone else getting this error?

    ReplyDelete
    Replies
    1. Hannah: So sorry about that. I should take my own advice and upload the code and data somewhere other than my personal website. All the code and data is now available on GitHub, and I've modified the code in the post above to read data from GitHub directly.

      Delete
    2. I still cannot seem to get it..

      Warning message:
      In download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", :download had nonzero exit status

      I have also tried manually downloading the file and importing it, but I also cannot get it to work!

      Delete
    3. Hannah,

      That's one single line that unfortunately wraps in the blog display text. Did you type the entire line?

      download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r", method="curl")

      Delete
    4. Yes, I am typing in the entire line of code provided. I am using rstudio..

      Delete
    5. Hannah - thanks for pointing this out. I just tried this on a Windows machine and got the same error. If you're using Windows, omit the method="curl" argument and that should work. I updated the code.

      Delete
  111. Hi Stephen,
    Hope all is well! I am making several Q-Q plots and I am trying to compare different association methods that correct for population stratification. Anyway I would like the y-axis on the plots to be the same. Fro example, I have some plots that have p-values of 1.0E-24 and some with p-values of 1.0E-8. Is there a way to use ymax with the QQ command? I tried but it didn't work for me.
    Thanks!
    Janina

    ReplyDelete
    Replies
    1. Hey Janina. I added an extra argument called ymax. Copy/paste this function and the examples to see it in action.

      https://gist.github.com/stephenturner/5321887

      Delete
    2. Thanks Stephen. Everything worked just fine :)

      Delete
  112. Hi Stephen,

    Thanks a lot for the code! I'm new to R and it made my life easier. One quick question please: Is it possible to plot a Manhattan plot with two p values for two phenotypes for the same SNP ? What do I need to change in the code ?

    Thanks gain for the post !
    Maen

    ReplyDelete
    Replies
    1. Several ways to do this. Not really recommended to modify results files, but the easiest way is to make one of them have a position 1bp higher or lower than the other, and if you're hilighting, add a _1 and _2 to the rs numbers.

      Delete
  113. Hmm.. hard to say without seeing your data. Are the columns still named "CHR", "SNP", "BP", and "P"?

    ReplyDelete
  114. Hi,
    I am trying to use this code to make a Manhattan plot, but mine does not look the same as your example. Instead of having spread dots like I would expect when plotting the BP, I get just a single straight line as though I am plotting only the CHR number. My R skills are not good enought to understand the code so could someone please help me?

    ReplyDelete
    Replies
    1. Does the example code and data work for you? If not, you might try poking around your data to make sure everything looks as expected.

      Delete
  115. Hi Stephen,

    Is there an easy way to make this plot with a list of SNPs to remove?

    Thanks,

    Charlotte

    ReplyDelete
    Replies
    1. Yeah, just remove before plotting. E.g. !(x %in% y)

      Delete
  116. Hi,
    Thank you for the R functions!
    On June 10, 2013 ther was a major update in the code, which caused probllems when trying to run it as described in the blogpost. Before it was running perfectly, but now I am getting messages like ""limitchromosomes" is not a graphical parameter", "formal argument "pch" matched by multiple actual arguments", "colors" not working any more.
    Is it possible to get a pre-June,10 version again, e.g. as a separate file or function? I am using R 2.15.1, Windows computer

    ReplyDelete
    Replies
    1. I am also encountering these same problems...

      Delete
    2. Gabor, Nano: Yes, there have been a few big updates, but we're adding back in some "legacy" option names, and I'll commit the code and update the tutorial when I'm back in town next week. Very sorry for the issue. In the meantime, you can copy and paste the old code from here. I'll comment again when I've updated the tutorial. Thanks.

      Delete
    3. Thanks for the code, Stephen! Also, for the annotate argument, do you know if I can use it for multiple groups of SNPs? (eg. I would like to highlight one group of SNPs green like you did in the tutorial but also highlight ANOTHER group of SNPs blue in the same plot, etc.)

      Delete
  117. Hi stephen,Thank you very much for your codes.I am wondering if I can use your codes to plot a mahanttan plot for my results that are found from the analyses of R and ASRemel,to make it clear not from PLINk, I have tried to prepare the files in the same name of yours SNP,CHR,BP and P, but my SNPs are not in rs they are just in order of numbers(1,2,3.....).Could you please give me some suggestions on how I can use your codes.Thank you in advance.

    ReplyDelete
  118. Hi Stephen,
    Its Janina again. I am plotting imputed data (~15 million SNPs) and I am getting the following error...
    Error in Math.factor(d$P) : log10 not meaningful for factors
    In addition: Warning messages:
    1: In Ops.factor(P, 0) : > not meaningful for factors
    2: In Ops.factor(P, 1) : <= not meaningful for factors

    I have have checked the files and all of the P values fall between 0 and 1 but I think there is a spacing issue or extra characters present. Below are the first few rows that I have loaded into R.
    Janina
    CHR SNP BP A1 A2 FRQ INFO OR SE P
    1 rs58108140 10583 G A 0.9352 0.3790 0.8756 0.3220 0.68
    1 rs189107123 10611 C G 0.9774 0.4527 0.8466 0.4791 0.7281
    1 rs180734498 13302 C T 0.8016 0.4482 0.9310 0.1793 0.69
    1 rs144762171 13327 G C 0.9750 0.4409 1.7412 0.4941 0.2617
    1 chr1:13957:D 13957 TC T 0.9726 0.4434 0.7246 0.4318 0.4557
    1 rs151276478 13980 T C 0.9773 0.4896 1.2930 0.4670 0.5822
    1 rs140337953 30923 G T 0.4590 0.4688 1.0767 0.1423 0.6035
    1 chr1:47190:I 47190 G GA 0.9602 0.3153 2.1181 0.4668 0.1079
    1 rs116400033 51479 T A 0.9381 0.4766 0.6665 0.2920 0.1647

    ReplyDelete
    Replies
    1. Janina,

      You have a data integrity issue. If "mydata" is your data frame containing those variables, you should check the class of those variables:

      sapply(mydata, class)

      The P column should be "numeric". If it isn't, you should take a look and see what went wrong with the software that gave you those p-values. Perhaps it's outputting something like "nan", "NaN", "Inf", or something else when you have a statistical test that fails to converge, and R is reading that variable as a character/factor, instead of a numeric value. You could hack around in R to fix or ignore those, but better to see what went wrong in the first place. awk, cut, grep, and Google are your friends ;-).

      Delete
  119. Hi Stephen, It`s Robel again. I am still looking for your suggestions to my comment on June 20.Thank you very much for your support.

    ReplyDelete
  120. Hi Stephen, I'm having the same trouble as Palle did in the post on December 7, 2012 at 8:56 AM. I have 5 chromosomes, 2L, 2R, 3L, 3R and X and I can't for the life of me work out how to avoid the errors I'm getting. Whats the best approach to be able to account for this and get the script working?

    cheers,

    Joe

    ReplyDelete
    Replies
    1. I keep getting errors such as below when running the plot. I have the CHR, BP and P columns all fine but I'm thinking its because of the different chromosomes.

      Error in Summary.factor(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, :
      max not meaningful for factors

      Any help would be very appreciated!

      cheers

      Delete
  121. Hi Stephen,

    Im using the qq function but I obtain the following error:

    Error in seq.default(10000, length(e), 100) : wrong sign in 'by' argument

    Any clues?

    Cheers

    ReplyDelete
    Replies
    1. Getting the same error. Did you ever find a fix?

      Delete
    2. Hello everyone,

      I'm getting the same error (Error in seq.default(10000, length(e), 100) : wrong sign in 'by' argument). Has anyone figured this one out?

      Thanks for the really helpful tool!

      Delete
    3. Hello again,

      I think I understand what's the problem and since no one has addressed this issue yet, I'll present here what I think is the problem and how to fix it.

      The issue is in seq(from,to,by): from has to be < than to.
      My understanding is that length(e) represents the total number of P-values that are usable (numeric and between 0 and 1). So, if one has less than 10000 P-values, the basic conditions for seq( ) are not met and there's a failure.

      I tried reducing the first number in seq( ) and I started getting the qq-plots with no problems.

      Please, let me know if I misinterpreted the reason for the error of if you think I'm doing something unreasonable.

      Delete
  122. Hi Stephen,
    I wanted to change chromosome labels to "1A, 1B, 1D, 2A, 2B, 2D... etc from "1, 2, 3...."..

    Any help please?

    Regards

    ReplyDelete
  123. Hi,

    I've read through all the comments. As I understand the old code (https://raw.github.com/stephenturner/qqman/master/qqman.r) has been updated, right?. There are some items we all would like to have added and some suggestions to that have been made:
    -add a limit to the y-axis for QQ-plots - DONE
    -add confidence intervals to the QQ-plots - DONE
    -annotate with SNP-id and/or gene name/loci of genome-wide significant and/or target sites/loci/SNPs - MQ added a suggestion - NOT SURE THIS DONE?
    -add special chromosomes (X,Y, MT) - MQ added a suggestion - NOT SURE THIS IS DONE?
    -get the dots bigger above a certain threshold (-log10(p)=8 e.g.) - PENDING?

    Additionally there are some bugs that have been fixed:
    -fix issue with a missing chromosome

    Stephen: if you need help, I can certainly try to, but I'm a newbie with R...

    Best,

    Sander

    ReplyDelete
    Replies
    1. Oh, just popped to mind:
      -calculate the lambda on the fly based on the p-values
      -plot a legend on the qq-plot (top left corner or something) displaying number SNPs and lambda

      Delete
    2. The annotation function seems to be not working. Is did the following but I get an error message. What am I missing?

      > download.file("https://raw.github.com/stephenturner/qqman/master/plink.assoc.txt.gz", destfile="./plink.assoc.txt.gz", method="curl")
      % Total % Received % Xferd Average Speed Time Time Time Current
      Dload Upload Total Spent Left Speed
      0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 4558k 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 32 4558k 32 1483k 0 0 726k 0 0:00:06 0:00:02 0:00:04 886k 39 4558k 39 1819k 0 0 541k 0 0:00:08 0:00:03 0:00:05 608k 40 4558k 40 1843k 0 0 452k 0 0:00:10 0:00:04 0:00:06 497k 41 4558k 41 1889k 0 0 339k 0 0:00:13 0:00:05 0:00:08 363k 41 4558k 41 1913k 0 0 308k 0 0:00:14 0:00:06 0:00:08 381k 43 4558k 43 1991k 0 0 280k 0 0:00:16 0:00:07 0:00:09 100k 44 4558k 44 2022k 0 0 250k 0 0:00:18 0:00:08 0:00:10 44151 48 4558k 48 2202k 0 0 243k 0 0:00:18 0:00:09 0:00:09 73969 59 4558k 59 2718k 0 0 270k 0 0:00:16 0:00:10 0:00:06 184k 71 4558k 71 3264k 0 0 295k 0 0:00:15 0:00:11 0:00:04 279k 80 4558k 80 3679k 0 0 305k 0 0:00:14 0:00:12 0:00:02 341k 91 4558k 91 4179k 0 0 320k 0 0:00:14 0:00:13 0:00:01 433k100 4558k 100 4558k 0 0 329k 0 0:00:13 0:00:13 --:--:-- 490k
      > download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", destfile="./snps.txt", method="curl")
      % Total % Received % Xferd Average Speed Time Time Time Current
      Dload Upload Total Spent Left Speed
      0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 946 100 946 0 0 4798 0 --:--:-- --:--:-- --:--:-- 5375
      > download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r", method="curl")
      % Total % Received % Xferd Average Speed Time Time Time Current
      Dload Upload Total Spent Left Speed
      0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 19135 100 19135 0 0 87816 0 --:--:-- --:--:-- --:--:-- 97131
      > source('./qqman.r')
      > snps_to_highlight = scan("./snps.txt", character())
      Read 92 items
      > results = read.table(gzfile("./plink.assoc.txt.gz"), header=T)
      > head(results)
      CHR SNP BP A1 F_A F_U A2 CHISQ P OR
      1 1 rs10495434 235800006 T 0.16670 0.1951 G 0.2428 0.62220 0.8250
      2 1 rs6689417 46100028 C 0.04255 0.0000 T 3.4840 0.06195 NA
      3 1 rs3897197 143700035 G 0.03191 0.0000 A 2.5980 0.10700 NA
      4 1 rs2282450 202300047 T 0.41840 0.3659 A 0.5154 0.47280 1.2470
      5 1 rs567279 66400050 0 0.00000 0.0000 T NA NA NA
      6 1 rs11208515 64900051 C 0.23960 0.2805 G 0.3861 0.53430 0.8082
      > manhattan(results,annotate=snps_to_highlight)
      Error in manhattan(results, annotate = snps_to_highlight) :
      D'oh! Annotate vector must be a subset of the SNP column.

      Delete
  124. Soooo, I've been going over the comments again. It's not clear to me: does it or does it not handle X and Y and MT chromosomes? If so:
    -how must it be coded in the data? As X or 23, Y or 24, MT or 25?
    -does a number or a letter appear on the x-axis? So 23 or X, 24 or Y, 25 or MT?
    -do I just plug into the command: manhattan (datafile, limitchromosomes=24,...) to get all chromosomes in the data?

    Thanks!
    P.S. Still willing to help on the code and the bugs and all the requests...

    ReplyDelete
  125. My apologies swvanderlaan - I need to find time to get back to this and completely rewrite the post.

    ReplyDelete
  126. Also, keep on getting the same error:
    Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
    max not meaningful for factors

    What does that even mean? It's only with plotting Manhattans... I've got 2.56m rows and four columns (SNP CHR BP P) and 22 chromosomes (1-22 human). The lowest p-value ~ 5e-60.

    Thanks

    S.

    ReplyDelete
    Replies
    1. What class are your variables? E.g. run sapply(yourdataframe, class)

      Delete
    2. Hi,
      Here's the result:
      sapply(meta_Pfixed_model1,class)
      SNP CHR BP P
      "factor" "factor" "integer" "numeric"

      So apparently my CHR is a factor...doh. I distinctly remember throwing out X and Y...

      Thanks!

      Delete
  127. Hey Stephen,
    I am running this package on one of UVA's servers. When I run the final script to create the plot, I don't get anything. There is a file called "Rplots.pdf" that is in my local directory but I cannot open it (it's also 300mb). Is there a library that needs to be installed for these plots to be stored or do they go to the local directory by default? Have you had a problem opening these files when running from a server?

    ReplyDelete
  128. Hi Stephen,

    I always use qqman for Manhattan and QQ plots (source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")). Now for the updated version, it looks that 'annotate' and 'colors' functions are not working. When I used my old code, SNPs selected were highlighted in black with rs numbers and also the following message showed:

    warning message:
    In plot.xy(xy.coords(x, y), type = type, ...) :
    "colors" is not a graphical parameter

    Thanks,

    Jian

    ReplyDelete
    Replies
    1. Use highlight.col= insted of annotate. This should work.

      For the "colors-problem" switch on a mac. This solved the problem in my case.

      Delete
  129. Hi Stephen,

    This is so helpful - many thanks!

    I am new to R so I am not sure how to trouble-shoot this issue. For some reason my Manhattan plot looks truncated. It looks as if it is not wide enough. The chromosome numbers appear stacked, not right next to each other. After chromosome 14 I don't see any more chromosome names. Do you know how I can change this?

    Thanks,
    Erin

    ReplyDelete
  130. In the manhattan function:

    What's the best way of labelling the top SNPs with rs ID, p-value and gene name ?

    What's the best way of labelling the horizontal threshold lines with the p-values that they represent ?

    ReplyDelete
  131. I am new to R and trying to make a manhattan plot and QQ plot following the example described here. I have understood most part of it, but I am not able to highlight SNPs listed in the snp.txt file. I did exactly as written in the example, but do not see green dots. Any help would be highly appreciated.

    ReplyDelete
  132. The map file in not integrated by default into the qassoc. So, how to run that code on qassoc files?

    ReplyDelete
  133. You'll have to join the map information to the stats column manually (use an INNER JOIN in SQL speak). Or the merge() function in R should do it, if both are indexed by rs-number.

    ReplyDelete
  134. Hi Stephen,

    Thanks for the excellent code. I have a quick question: much like Tauqeer Alam, I am not seeing green dots on my Manhattan when feeding a list of SNPs into R. Rather, I'm getting an odd-looking column of text, as seen in the screenshot below:

    http://imgur.com/eIQAIcI

    I'm following your code to a tee and have tested this error on both a Mac and PC. Any suggestions?

    ReplyDelete
    Replies
    1. Crazy looking plot. The arguments have changed slightly since the last blog post. To change the color of SNPs, use the "highlight" argument. To add a text label (rs ID) above a SNP, use the "annotate" argument.

      Delete
  135. I also had this problem.

    > head(subset(results, select=c(SNP, CHR, BP, P)))
    SNP CHR BP P
    1 BTA-28471-no-rs 0 0 NA
    2 BTA-28495-no-rs 0 0 NA
    3 BTA-28466-no-rs 0 0 NA
    4 ARS-USMARC-Parent-DQ650635-rs29012174 0 0 NA
    5 ARS-USMARC-Parent-DQ451555-rs29010795 0 0 NA
    6 BPI-1 0 0 NA
    > manhattan(results)
    Error in `$<-.data.frame`(`*tmp*`, "pos", value = NA) :
    replacement has 1 row, data has 0


    For the P values, they are all NA. The assignment is due tomorrow morning and I've been banging my head over this all day.

    ReplyDelete
    Replies
    1. Try filtering out the NA p-values. If they're all NA, you're doing something wrong on the testing side, not the viz side.

      Delete
    2. This comment has been removed by the author.

      Delete
  136. Thank you Dr. Turner, this script is great. I was having one issue I was hoping to get help with. I am trying to generate the plot with different colored points:
    manhattan(results, colors=c("black","grey50","orangered1"), pch=20, genomewideline=F, suggestiveline=F)

    but get the following warning:
    "colors" is not a graphical parameter

    I was wondering where the script could be modified to account for this?

    Thanks!
    Sri

    ReplyDelete
  137. Use pt.col to change the colors. I need to update the tutorial, many apologies.

    ReplyDelete
    Replies
    1. Hi Stephen,

      Thanks for the code!
      I tried using pt.col to adjust my colors and I am still getting the grey and black manhattan plot. I even changed it in the manhattan= section so that grey isn't even there at all and am still getting grey and black plots with no errors.

      Any suggestions?

      Thanks,
      Ray

      Delete
    2. So you're calling manhattan(data, pt.col=c("orange", "red")), for example?

      Delete
  138. I've had great luck using this code in the past, but am having some issues with the ticks/labels of the x axis with the newest code. I only get a single tick mark/label right at the beginning (which is equal to the value of ticks=floor(length(d$pos))/2+1). I feel like there's a rep argument of sorts missing in there. I've tried manipulating things, but am not making progress. Any suggestions?

    ReplyDelete
  139. Hi,
    I like your confidence interval plotting on the QQ-plot. I want to add this to our qq-plot function in MANTEL (meta-analysis of GWAS package made by De Bakker Lab).
    Just wondering: what does the "xspace = 0.078" function/line actually do?

    Thanks and looking forward to the tweaks of this great script, Steven! :-)

    Best,

    Sander

    ReplyDelete
  140. This comment has been removed by the author.

    ReplyDelete
  141. Hi, Stephen

    I have done GWAS analysis with different models (PCA, PCA+K, Q, Q+K). So, I have 4 sets of p-values for the same trait. How can I plot a QQ plot for 4 sets of p-values together?

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.