Getting Genetics Done: Annotated Manhattan plots and QQ plots for GWAS using R, Revisited

Monday, April 25, 2011

Annotated Manhattan plots and QQ plots for GWAS using R, Revisited

**** UPDATE, May 15 2014 *****
The functions described here have now been wrapped into an R package. View the updated blog post or see the online package vignette for how to install and use. If you'd still like to use the old code described here, you can access this at version 0.0.0 on GitHub. The code below likely won't work.
*****************************

Last year I showed you how to create manhattan plots, and later how to highlight regions of interest, using ggplot2 in R. The code was slow, required a lot of memory, and was difficult to maintain and modify.

I finally found time to rewrite the code using base graphics rather than ggplot2. The code is now much faster, and if you're familiar with base R's plot options and graphical parameters, most of these can now be passed to the functions to tweak the plots' appearance. The code also behaves differently depending on whether you have results for one or more than one chromosome.

Here's a quick demo.

First, either copy and paste the code from GitHub, or run the following commands in R to download and source the code from GitHub (you can't directly read from https in R, so you have to download the file first, the source it). Note the command is different on Mac vs Windows.

Download the function code on Mac:

download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r", method="curl")

Download the function code on Windows, leave out the method="curl" argument:

download.file("https://raw.github.com/stephenturner/qqman/master/qqman.r", destfile="./qqman.r")

Now, source the script containing the functions.

source("./qqman.r")

Next, load some GWAS results, and take a look at the relevant columns (same as above, download first then read locally from disk). This is standard output from PLINK's --assoc option.

Download example data on Mac:

download.file("https://raw.github.com/stephenturner/qqman/master/plink.assoc.txt.gz", destfile="./plink.assoc.txt.gz", method="curl")

Download example data on Windows:

download.file("https://raw.github.com/stephenturner/qqman/master/plink.assoc.txt.gz", destfile="./plink.assoc.txt.gz")

Read in the sample results, and take a look at the first few lines:

results = read.table(gzfile("./plink.assoc.txt.gz"), header=T)

head(subset(results, select=c(SNP, CHR, BP, P)))

The manhattan function assumes you have columns named SNP, CHR, BP, and P, corresponding to the SNP name (rs number), chromosome number, genomic coordinate, and p-value. Missing values (where regression failed to converge, for instance) should be recoded NA before reading into R. Do this with a quick sed command. Here's what the data looks like:

         SNP CHR        BP       P
1 rs10495434   1 235800006 0.62220
2  rs6689417   1  46100028 0.06195
3  rs3897197   1 143700035 0.10700
4  rs2282450   1 202300047 0.47280
5   rs567279   1  66400050      NA
6 rs11208515   1  64900051 0.53430

Now, create a basic manhattan plot (click the image to enlarge):

manhattan(results)

If you type args(manhattan) you can see the options you can set. Here are a few:

colors: this is a character vector specifying the colors to cycle through for coloring each point. Here's a PDF chart of R's color names.

ymax: this is the y-axis limit. If ymax="max" (default), the y-axis will always be a little bit higher than the most significant -log10(p-value). Otherwise you can set this value yourself.

cex.x.axis: this can be used to shrink the x-axis labels by setting this value less than 1. This is handy if some of the tick labels aren't showing up because the plot region is too small.

*** Update March 9 2012*** cex.x.axis is deprecated. To change the x-axis size, use the default base graphics argument cex.axis.

limitchromosomes: you can limit which chromosomes you want to display. By default this restricts the plot to chromosomes 1-23(x).

suggestiveline and genomewideline: set these to FALSE if you don't want threshold lines, or change the thresholds yourself.

annotate: by default this is undefined. If you supply a character vector of SNP names (e.g. rs numbers), any SNPs in the results data frame that also show up here will be highlighted in green by default. example below.

... : The dot-dot-dot means you can pass most other plot or graphical parameters to these functions (e.g. main, cex, pch, etc).

Make a better looking manhattan plot. Change the plot colors, point shape, and remove the threshold lines:

manhattan(results, colors=c("black","#666666","#CC6600"), pch=20, genomewideline=F, suggestiveline=F)

Now, read in a text file with SNP names that you want to highlight, then make a manhattan plot highlighting those SNPs, and give the plot a title:

Download a SNP list on Mac:

download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", destfile="./snps.txt", method="curl")

Download a SNP list on Windows:

download.file("https://raw.github.com/stephenturner/qqman/master/snps.txt", destfile="./snps.txt")

Read in the SNP list, and create an annotated plot:

snps_to_highlight = scan("./snps.txt", character())

manhattan(results, annotate=snps_to_highlight, pch=20, main="Manhattan Plot")

Finally, zoom in and plot only the results for chromosome 11, still highlighting those results. Notice that the x-axis changes from chromosome to genomic coordinate.

manhattan(subset(results, CHR==11), pch=20, annotate=snps_to_highlight, main="Chr11")

Finally, make a quantile-quantile plot of the p-values. To make a basic qq-plot of the p-values, pass the qq() function a vector of p-values:

qq(results$P)

Perhaps we should have made the qq-plot first, as it looks like we might have some unaccounted-for population stratification or other bias.

The code should run much faster and use less memory than before. All the old functions that use ggplot2 are still available, now prefixed with "gg." Please feel free to use, modify, and redistribute, but kindly link back to this post. The function source code, data, SNPs of interest, and example code used in this post are all available in the qqman GitHub repository.

203 comments:

UnknownApril 25, 2011 at 4:16 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownApril 25, 2011 at 4:22 PM
This is fantastic! Just what I was looking for. I stumbled across your earlier version this afternoon, and was having difficulty getting it to work. The new version worked beautifully and gave me the plot I was dreading having to code up myself from scratch. Could not have come a at a better time! Thanks!
ReplyDelete
Replies
swvanderlaanApril 26, 2011 at 7:20 AM
Just wanted to add: great! Really, thanks a bunch. It works like a charm...
I've added some code so it'll automatically outputs .EPS-files.

Only thing I'm looking for now, is to annotate the top hits (-log10(P)>=8) with the gene name... That would be great!

Thanks!
ReplyDelete
Replies
AnonymousApril 26, 2011 at 9:52 AM
I have run it and this code is very cooool
Also, a quick question, when I creat my own results and followed the format of “SNP CHR BP P”
However, R returns
“Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 2”

Anybody knows why?
Thanks
ReplyDelete
Replies
Stephen TurnerApril 26, 2011 at 12:10 PM
I'm guessing the error might be because you have something other than a number in the p-value column (maybe a NaN?). This column can only be a number between 0 and 1 or NA for missing. Next time I update the code I'll add in some checks for this.
ReplyDelete
Replies
AnonymousMay 24, 2011 at 3:39 AM
Thank you!

I started learning how to run GWAS two hours ago, and thanks to this I'm already looking at some preliminary results.

Science has never been simpler! ;)
ReplyDelete
Replies
ErikJune 1, 2011 at 8:29 AM
Thanks for the nice script! While using it I found out, that it gives an error when there are no snps for a specific Chromosome.
ReplyDelete
Replies
SagiJune 20, 2011 at 3:15 AM
Many thanks! This is great!
One thing though: when I try to use the "ymax" in order to limit the Y axis values, it doesn't really work. maybe I'm using it the wrong way.
Here's what I typed:

manhattan(results, ymax=5, limitchromosomes=F, colors=c("black","#666666","#CC6600"), pch=20, genomewideline=F, suggestiveline=F)

Thanks!
ReplyDelete
Replies
Stephen TurnerJune 20, 2011 at 9:28 AM
Sagi, I think I see your problem here. These two lines gave it the behavior I desired:

if (ymax=="max") ymax<-ceiling(max(d$logp))
if (ymax<8) ymax <- 8

... which is to say, if ymax="max", make the maximum value on the y axis at least one point higher on the -log10p scale than the most significant SNP. But if the max y was less than 8, set it at 8 (because genome-wide significance is conventionally 5e-8).

If you want this code to behave the way you want, remove the second line that resets ymax to 8 if it's less than that.
ReplyDelete
Replies
swvanderlaanJune 28, 2011 at 8:38 AM
Hi,

As many wrote: thanks a bunch! I'm having difficulty getting the genome-wide significance at 5e-8. It just won't get to 8, it's always a little off... Any clues?
I changed your code a little, but only in making the line for instance dotted. Here's the code:

# manhattan plot using base graphics
manhattan = function(dataframe, colors=c("black", "gray50"), ymax="max", cex.x.axis=0.8, limitchromosomes=1:23, suggestiveline=-log10(1e-4), genomewideline=-log10(5e-8), annotate=NULL, ...) {

d=dataframe
if (!("CHR" %in% names(d) & "BP" %in% names(d) & "P" %in% names(d))) stop("Make sure your data frame contains columns CHR, BP, and P")

if (any(limitchromosomes)) d=d[d$CHR %in% limitchromosomes, ]
d=subset(na.omit(d[order(d$CHR, d$BP), ]), (P>0 & P<=1)) # remove na's, sort, and keep only 0<P<=1
d$logp = -log10(d$P)
d$pos=NA
ticks=NULL
lastbase=0
colors <- rep(colors,max(d$CHR))[1:max(d$CHR)]
if (ymax=="max") ymax<-ceiling(max(d$logp))
if (ymax<9) ymax<-9

numchroms=length(unique(d$CHR))
if (numchroms==1) {
d$pos=d$BP
ticks=floor(length(d$pos))/2+1
} else {
for (i in unique(d$CHR)) {
if (i==1) {
d[d$CHR==i, ]$pos=d[d$CHR==i, ]$BP
} else {
lastbase=lastbase+tail(subset(d,CHR==i-1)$BP, 1)
d[d$CHR==i, ]$pos=d[d$CHR==i, ]$BP+lastbase
}
ticks=c(ticks, d[d$CHR==i, ]$pos[floor(length(d[d$CHR==i, ]$pos)/2)+1])
}
}

if (numchroms==1) {
with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(P))), xlab=paste("Chromosome",unique(d$CHR),"position"), ...))
} else {
with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(P))), xlab="Chromosome", xaxt="n", type="n", ...))
axis(1, at=ticks, lab=unique(d$CHR), cex.axis=cex.x.axis)
icol=1
for (i in unique(d$CHR)) {
with(d[d$CHR==i, ],points(pos, logp, col=colors[icol], ...))
icol=icol+1
}
}

if (!is.null(annotate)) {
d.annotate=d[which(d$SNP %in% annotate), ]
with(d.annotate, points(pos, logp, col="darkorange", ...))
}

if (suggestiveline) abline(h=suggestiveline, col="royalblue4", lty=5)
if (genomewideline) abline(h=genomewideline, col="red",lty=5)
}

Thanks!
ReplyDelete
Replies
Stephen TurnerJune 28, 2011 at 1:31 PM
@swvanderlaan - the reason the line won't get to 8 on the y-axis is because genomewideline is set to -log10(5e-8), which is about 7.30 on the -log10 scale. The p=5e-8 number comes from the notion that there are ~1 million independent tests in the genome, so by a Bonferroni correction, 0.05/1e6=5e-8, the widely accepted holy grail p-value for "genome-wide significance."

If you want the line to show up at 8 on the y-axis, set genomewideline=-log10(1e-8), or simply, genomewideline=8.

Hope this helps.
ReplyDelete
Replies
swvanderlaanJune 28, 2011 at 1:37 PM
Hey Stephen,

So, you stare for minutes, hours at the code, and you just don't see it... Haha, thanks!

Now, what I wrote earlier: I want names with my top hits. And I'll figure that one out, I'm sure. And when I do, I'll be sure to get back to this post, to share it!

Thanks.

Sander
ReplyDelete
Replies
Stephen TurnerJune 28, 2011 at 2:03 PM
Try using the text() function to add text to the plot. It probably wouldn't be too hard to work this into the function to annotate SNPs with gene names using the text() function. You'll probably want to limit it to SNPs having annotation information AND having p-value less than a certain threshold, so that it's readable and looks nice.
ReplyDelete
Replies
AnonymousJuly 7, 2011 at 11:05 AM
Does Dr. swvanderlaan or Dr. Stephen Turner can help to solve how to make top 10 SNPs name on the plots with R?
ReplyDelete
Replies
AnonymousJuly 19, 2011 at 5:06 PM
Can anyone tell me how to adjust the size of the points on the plot?
ReplyDelete
Replies
Stephen TurnerJuly 19, 2011 at 5:11 PM
Use the cex= option. See here: http://www.inside-r.org/r-doc/graphics/par
ReplyDelete
Replies
AnonymousJuly 19, 2011 at 7:12 PM
hmm...Read through that but didn't understand a whole lot. par(pch=20, cex=0.5) kind of does what I want but it also resizes the axis label fonts. One would think there would be something like pch.cex to do this... Can you give me any clearer guidance please?
ReplyDelete
Replies
Stephen TurnerJuly 19, 2011 at 8:37 PM
Oh, I see what's going on here now. I included a graphical parameter that I named cex.x.axis to resize x-axis labels. When using the cex= argument, it conflicts with the argument I supply. Source this version, and you should be able to change the point size with the cex option.

manhattan(results) #normal size
manhattan(results, cex=.5) #half size

https://gist.github.com/1094160
ReplyDelete
Replies
Mark AquinoJuly 20, 2011 at 8:30 AM
Thanks Stephen that worked.
ReplyDelete
Replies
AnonymousAugust 10, 2011 at 4:41 PM
Hi Stephen,

I ran this script and got the following error message :
"Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 36720"

a similiar problem somebody had.

I double-checked the P values, all fall in (0,1] and no missing values.

Do you know how this would happen? anything is wrong in my dataset?

Many thanks,
ReplyDelete
Replies
Stephen TurnerAugust 10, 2011 at 5:09 PM
Hmm, not sure without seeing your data. Are you sure that all the other relevant columns contain numeric values only? How many chromosomes do you have?
ReplyDelete
Replies
AnonymousAugust 15, 2011 at 5:08 AM
Hi Stephen,

I'm trying to run this script on some results from mice, and am getting this error: Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :max not meaningful for factors.
I thought this may be due to the choromosome number being different, but I have now changed the script to chromosome 1:19 and it still has the same error. Do you know what might be causing this problem? Thanks.
ReplyDelete
Replies
AnonymousAugust 15, 2011 at 10:05 AM
Hi Stephen,

I posted a problem a few days ago when I ran this script and the error message was:
:"Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 36720"

I double-checked the varaibles in the dataset and all the other relevant columns contain numeric value. However, there was only 21 chromosomes (chromomsome 3 was missing). I am wondering if this is the reason which caused the problem?

Thanks,
ReplyDelete
Replies
Stephen TurnerAugust 15, 2011 at 3:40 PM
Anonymous 1 and anonymous 2: I'll try to take a look at the code sometime soon but I've got a major grant deadline coming up that's sucking much of my time away. Anonymous 2: That could easily be the problem - for a dirty fix, try putting in a single line in your results file between chr 2 and 4, where the chr is 3, position is 1, and p-value is 1, see if it works and let me know. Anonymous 1 - it's difficult for me to troubleshoot without some example data. Could you use something like http://jotonce.com/ to copy a few lines out of your results file, and email me the password to access it?
ReplyDelete
Replies
AnonymousAugust 17, 2011 at 9:03 AM
Stephen,

after adding a redundant line for chr 3, the function works well! thank you so much!
ReplyDelete
Replies
Stephen TurnerAugust 17, 2011 at 1:24 PM
Thanks for helping me figure that one out. I'm soon going to be updating the code to fix a few other things and add things like automatic annotation based on gene region, SNP features, etc. I'll add this to the list of things to fix.
ReplyDelete
Replies
AnonymousAugust 31, 2011 at 4:25 PM
Hey Stephen,

I'm running into the following error when trying to create a manhattan plot:
Error in Math.factor(d$P) : log10 not meaningful for factors In addition: Warning messages: 1: In Ops.factor(P, 0) : > not meaningful for factors 2: In Ops.factor(P, 1) : <= not meaningful for factors

Any insight into what might be the source of this error would be greatly appreciated as I'm stumped!
ReplyDelete
Replies
Stephen TurnerAugust 31, 2011 at 6:02 PM
Do you have something other than numbers between 0 and 1 (not including 0)?
ReplyDelete
Replies
Juan VivarSeptember 20, 2011 at 12:38 PM
Thank you for sharing the code! I need to prepare one of these plots quickly and your programming is pretty straightforward. Nice work!
ReplyDelete
Replies
AnonymousSeptember 24, 2011 at 11:02 AM
Is there a way to get a chromosome 0 for scaffolds not incorporated into CHR?
ReplyDelete
Replies
Stephen TurnerSeptember 24, 2011 at 9:33 PM
I'll add it to the growing list of changes for the next update. For now, what happens if you simply have a chromosome 0, or chromosome 24? Make sure you appropriately change the limitchromosomes argument.
ReplyDelete
Replies
AnonymousSeptember 24, 2011 at 9:58 PM
If I change the limit chromosome argument to 0:8, and include CHR 0 in my source file I get this error:

Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 213

I get the same error if I set limitchromosomes=F

I have not tried it with 24 as my organism only has a few chromosomes.
ReplyDelete
Replies
AnonymousSeptember 24, 2011 at 10:07 PM
Also, if I leave it as 1:8 it plots perfectly, but just excludes the CHR 0 data.
ReplyDelete
Replies
AnonymousSeptember 25, 2011 at 7:19 AM
Changing this i==1 to i ==0 allows for CHR 0,but for some reason the last chromosome has not points.

+ } else {
+ for (i in unique(d$CHR)) {
+ if (i==0) {
ReplyDelete
Replies
AnonymousSeptember 25, 2011 at 7:28 AM
and changing icol=icol+0 restores those points for the last CHR, but the colors by CHR are missing

+ for (i in unique(d$CHR)) {
+ with(d[d$CHR==i, ],points(pos, logp, col=colors[icol], ...))
+ icol=icol+0
ReplyDelete
Replies
AnonymousSeptember 25, 2011 at 3:31 PM
Finally got it with changing from i==1 to i==0 like so:

+ } else {
+ for (i in unique(d$CHR)) {
+ if (i==0) {

and then changing type= "n" to type = "p"

+ with(d, plot(pos, logp, ylim=c(0,ymax), ylab=expression(-log[10](italic(p))), xlab="Chromosome", xaxt="n", type="p", ...))
ReplyDelete
Replies
Stephen TurnerSeptember 26, 2011 at 1:14 PM
Excellent! Thanks for hammering away at this. I'll try to make the code more flexible on my next round of updates.
ReplyDelete
Replies
AnonymousSeptember 27, 2011 at 6:34 AM
Hi Stephen,

Thanks. This is great !

I have tried to add the SNP name in the plot using text. Using the BP for that SNP, how can I find the x-coordinate in the plot? For example, what is x-coordinate in the plot for CHR=2, BP=53606599)?

Thanks in advance.
ReplyDelete
Replies
Stephen TurnerSeptember 27, 2011 at 11:42 AM
If you look at the code here, in lines 45-53 I'm creating a new variable in the data frame called d$pos. On chromosome 1 d$pos is set to d$BP. For every remaining chromosome, d$pos is set to the BP position on that chromosome, added to the last base position on the previous chromosome. For instance, if I have 3 SNPs on chromosome 1 at positions 10, 20, and 30, and 3 SNPs on chromosome 2 at positions 5, 15, 25, the X coordinates for the six SNPs would be 10, 20, 30, 35, 45, 55. The x-axis uses the d$pos variable, and you can mod the code to use d$pos as the x coordinate for your text. I'd love to see what you do with this, please post the code to https://gist.github.com/. Thanks!
ReplyDelete
Replies
AnonymousOctober 5, 2011 at 10:12 PM
We were just using plink the other day and I wanted to create a Manhattan plot for our data. I am running into this error:
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 23

My plink.assoc file is from a small dataset so it only contains two CHR. I've searched the P column for non-0 or 1 but didn't find any.

Thanks for writing this code! I appreciate the time and help!

I can e-mail the the plink.assoc file if you need it.

ara
ReplyDelete
Replies
AnonymousOctober 5, 2011 at 10:20 PM
I actually figured it out using the args(manhattan). When you have less then 23 CHR you need to use the limitchromosomes=#,# . Where #s are your CHRs that are present in the file.
ReplyDelete
Replies
AnonymousOctober 5, 2011 at 10:37 PM
Well I thought I had it solved but...

So I have two CHR 11 and 17. I use the limitchromosomes to get it to plot 11 but how to let R know I want just 11 and 17?
ReplyDelete
Replies
Stephen TurnerOctober 6, 2011 at 6:27 AM
Thanks for figuring this out. As with the many other little inconsistencies, I'll try to iron this out and make the code a little more robust in the next version.
ReplyDelete
Replies
AnonymousOctober 7, 2011 at 10:11 AM
Just another quick note. I tried to use the limitchromosomes=11:12 (basically turned my 17 into 12 so it's consecutive) the same error pops up. It looks like you can do a single CHR using limitchromosomes but not consecutive CHRs unless you have all 1:23 present.
ReplyDelete
Replies
Stephen TurnerOctober 7, 2011 at 10:33 AM
Thanks for the note!
ReplyDelete
Replies
AnonymousOctober 25, 2011 at 8:54 PM
Is there any way to change the colour of the annotation from green to red? I don't quite understand how the dot-dot-dot function is supposed to work?
ReplyDelete
Replies
Stephen TurnerOctober 26, 2011 at 8:06 AM
Change the color on line 70.
ReplyDelete
Replies
swvanderlaanNovember 8, 2011 at 10:42 AM
Hi Stephen,

Just a quick question: how do you usually select you annotation SNPs? Do you say (using PLINK): pick a window of 100 SNPs around my hit? Or do make it broader?

Thanks!

Sander
ReplyDelete
Replies
Stephen TurnerNovember 8, 2011 at 11:32 AM
Where I've needed this in the past is to highlight known hits based on RS number from the GWAS catalog or something. If you wanted to highlight "skyscrapers" in the manhattan plot, you could just select all the SNPs within a certain window like you said.
ReplyDelete
Replies
swvanderlaanNovember 9, 2011 at 6:20 AM
Exactly, but what range would you take 100-250kb, or would rather take 500-1500kb?
ReplyDelete
Replies
Stephen TurnerNovember 9, 2011 at 6:29 AM
It's hard to say really. You might base it on how big the haplotype blocks are in that region. But if you're just coloring neighboring SNPs to highlight the peak on the plot, you probably just want to eyeball it, and pick whatever size looks good.
ReplyDelete
Replies
Bahar ErarDecember 2, 2011 at 3:08 PM
It works great! Thanks a lot!!

I also encountered that error when there are no snps for a specific Chromosome. I solved it with a minor adjustment to the loop on line 45. Below is the code.

CHRs <- unique(d$CHR)
numchroms=length(CHRs)
if (numchroms==1) {
d$pos=d$BP
ticks=floor(length(d$pos))/2+1
} else {
for (i in 1:numchroms) {
if (i==1) {
d[d$CHR==CHRs[i], ]$pos=d[d$CHR==CHRs[i], ]$BP
} else {
lastbase=lastbase+tail(subset(d,CHR==CHRs[i-1])$BP, 1)
d[d$CHR==CHRs[i], ]$pos=d[d$CHR==CHRs[i], ]$BP+lastbase
}
ticks=c(ticks, d[d$CHR==CHRs[i], ]$pos[floor(length(d[d$CHR==CHRs[i], ]$pos)/2)+1])
}
}
ReplyDelete
Replies
Stephen TurnerDecember 2, 2011 at 3:16 PM
Bahar - thanks for sharing your solution. I think others have had this problem before but I never knew what was causing it.
ReplyDelete
Replies
AnonymousDecember 20, 2011 at 1:47 PM
This is great. Thanks a lot Dr. Turner.
I was trying to figure it how to Zoom in a specific location of Chromosome. For example, if I want to have Manhattan plot only for BP "1e+07 to 2e+07" of CHR1. Any suggestions?
ReplyDelete
Replies
Stephen TurnerDecember 20, 2011 at 2:14 PM
Use LocusZoom.
ReplyDelete
Replies
AnonymousJanuary 13, 2012 at 5:42 PM
Hi Stephen,

Thanks for your code. It's awesome. I did a MH plot using your code. In the x-axis, all the chr # didn't show up. How do I resolve this issue?
All the #s came up between 1 and 9 and then I have 11,13, 15, 19 etc.

Thanks,

-Joey
ReplyDelete
Replies
Stephen TurnerJanuary 13, 2012 at 6:11 PM
Joey - make a bigger plog. Save the plot to a file and make it wider.
ReplyDelete
Replies
AnonymousJanuary 17, 2012 at 10:48 PM
Hi Stephen,

This function is really helpful. I love reading your blog, really very informative, for people like me who have to learn statistics to be in field of human genetics.

I used the function to plot Manhattan but, I am having trouble with pch, dots are very big and when I plot 6million SNP's with imputed data big dots merge with each other and it looks messy. I tried to change size with cex, but it changed the size of x axis and dots remained of same size. Can you suggest something to deal with this?

Thanks,
ReplyDelete
Replies
Stephen TurnerJanuary 18, 2012 at 6:19 AM
Ah, I see what's going on here. I named the argument "cex.x.axis" so I could control this at line 60 in the code. Copy the function, and rename the "cex.x.axis" to something like xcex, and when you pass in the cec= argument, it should work properly to resize the points, not the x-axis labels.
ReplyDelete
Replies
NasirFebruary 24, 2012 at 10:09 AM
Great codes! Very useful. Thank you Stephen
ReplyDelete
Replies
AthenaMarch 9, 2012 at 5:52 AM
Hi Stephen,

This code is so useful - thank you for sharing it!

I have been adapting your script, but I'm having some problems with adding a title to the Manhattan plot (although I can successfully change the colours):

# manhattan plot using base graphics
manhattan = function(dataframe, colors=c("#8A2BE2", "#5F9EA0", "#483D8B" ), ymax="max", cex.x.axis=1, limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL,
main="Viral load in cell lines", ...) {

If you could suggest what I should do to add a title, that would be much appreciated!
ReplyDelete
Replies
AthenaMarch 9, 2012 at 6:22 AM
I'm also having trouble changing the size of the points using pch:

# manhattan plot using base graphics
manhattan = function(dataframe, colors=c("#8A2BE2", "#5F9EA0", "#483D8B" ), ymax="max", cex.x.axis=1, limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL,
main="Viral load in cell lines", pch=20, ...) {
ReplyDelete
Replies
Stephen TurnerMarch 9, 2012 at 2:32 PM
Hello Anonymous from July 19 - I believe I solved your cex problem.
ReplyDelete
Replies
AnonymousMarch 30, 2012 at 9:42 AM
Hi Stephen,

Awesome work! I am using R studio to run the code. I am having trouble getting the last command to execute. manhattan(results) gives an Error: could not find function "manhattan".

Any ideas?

Thanks
ReplyDelete
Replies
Stephen TurnerApril 2, 2012 at 7:34 AM
It looks like you didn't source in the function. Copy and paste the function yourself, or source it from the web directly using source("http://www.StephenTurner.us/qqman.r")
ReplyDelete
Replies
AmolMay 3, 2012 at 7:58 AM
Hello Stephen,
Its a very nice piece of code you have compiled. I am using R to plot my GWAS data. I was wondering if we could plot lines instead of dots with colour intensity proportional to LOD scores. In that way, it'll be easy to draw region specific conclusion about association.

Regards,
Amol

Best Wishes
ReplyDelete
Replies
Stephen TurnerMay 3, 2012 at 8:27 AM
Amol - that would be pretty simple with the old ggplot2 code - just use geom="line" and colour=LOD or the like. I'm not sure how to build this into the current implementation, but contributions are welcome.
ReplyDelete
Replies
Manav KapoorJune 5, 2012 at 4:51 PM
Hi Stephen,

I want to have grids at beginning and end of each chromosome on X-axis in my Manhattan plot. Can you tell how should I specify that?

Thanks,

Manav
ReplyDelete
Replies
Stephen TurnerJune 6, 2012 at 6:29 AM
This should be easy to do with the abline function.
ReplyDelete
Replies
AnonymousJune 26, 2012 at 12:36 PM
Hi Stephen,

Great code! Thank you so much!

I am trying to replicate the manhattan plot in this paper, http://www.pnas.org/content/107/39/16910.full.pdf+html?with-ds=yes, (Fig.S8) with the data provided, http://heim.ifi.uio.no/bioinf/Projects/ASCAT/Version1/AllelicSkewness.txt.

I had the same error as some described above:
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 401

So, I tried to input the data bit by bit to determine where the error is.

I located the first error in this line in the data:
rs12526698 6 2610972 4 2 4 2 2 6 12 8 0.6 0.503444672

at Chromosome 6 with p-value 0.503444672. When I changed this p-value to NA it gave me this message:

Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 400

indicating that a p-value other than NA would result in an error. (Notice the change from 401 to 400). Don't know why it is only accepting "NA".

There are 23 chromosomes by the way.

Thank you,
Miah
ReplyDelete
Replies
AnonymousJune 26, 2012 at 12:38 PM
Hi Stephen,

Great code! Thank you so much!

I am trying to replicate the manhattan plot in this paper, http://www.pnas.org/content/107/39/16910.full.pdf+html?with-ds=yes, (Fig.S8) with the data provided, http://heim.ifi.uio.no/bioinf/Projects/ASCAT/Version1/AllelicSkewness.txt.

I had the same error as some described above:
Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 401

So, I put in the data bit by bit to determine where the error is.

I located the first error in this line in the data:
rs12526698 6 2610972 4 2 4 2 2 6 12 8 0.6 0.503444672

at Chromosome 6 with p-value 0.503444672. When I changed this p-value to NA it gave me this message:

Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 400

indicating that a p-value other than NA would result in an error. (Notice the change from 401 to 400)

There are 23 chromosomes by the way.

Thank you,
Miah
ReplyDelete
Replies
Stephen TurnerJuly 2, 2012 at 1:59 PM
I'm guessing you changed the column headers? The manhattan function assumes you have columns named SNP, CHR, BP, and P, corresponding to the SNP name (rs number), chromosome number, genomic coordinate, and p-value. Missing values (where regression failed to converge, for instance) should be recoded NA before reading into R.

I suppose I should add in some error checking but it's been a while since I've looked at this code. Hopefully I can update it soon. Let me know if this helps.

Does it work otherwise if you put say all the data from chromosomes 1-5 in there? If so, something's odd with the data at that point on chr 6.
ReplyDelete
Replies
AnonymousJuly 3, 2012 at 6:21 PM
Yes, I did change the corresponding headers. It does work when I put all the data from chromosome 1-5 until that point on chromosome 6. When I changed the p value at that point to NA, it worked fine.

Thanks,

Miah
ReplyDelete
Replies
AnonymousJuly 11, 2012 at 7:33 AM
i have used "source("http://www.StephenTurner.us/qqman.r")" server several times before, but suddenly it's not working for last few days ! can you please suggest some way ?

Best,
Sabyasachi
ReplyDelete
Replies
AnonymousJuly 18, 2012 at 10:30 AM
Hi Stephen, thanks for this amazing code!
I am just wondering if I can change the label for chr X ("X" instead of "23").

Is it possible?

Thank you!

Diego
ReplyDelete
Replies
Stephen TurnerJuly 21, 2012 at 12:09 PM
Diego - the quick and dirty solution would be to change line 60 of the code where it says lab=unique(d$CHR), change that to lab=c(1:22,"X").
ReplyDelete
Replies
jpsangioJuly 31, 2012 at 10:17 AM
Stephen:

I have used the code successfully in the past. I am now obtaining the error code below. Do you have any guidance on this issue. I have attached by data.

Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) :
replacement has 0 rows, data has 1

CHR SNP BP A1 TEST NMISS OR SE L95 U95 STAT P
1 rs3813605 84743535 A DOM 2200 1.109 0.09787 0.9154 1.343 1.056 0.2908
2 rs4953223 45370846 C DOM 2201 1.323 0.09777 1.092 1.602 2.862 0.004208
2 rs34058885 49417012 A DOM 2200 0.9881 0.09331 0.8229 1.186 -0.1288 0.8975
2 rs7599054 135257016 G DOM 2201 0.7543 0.09637 0.6245 0.9111 -2.926 0.003435
3 rs3896023 173034387 A DOM 2197 0.883 0.09549 0.7323 1.065 -1.303 0.1925
4 rs1554668 114576711 T DOM 2199 1.28 0.1163 1.019 1.608 2.125 0.03362
5 rs12520934 67162619 A DOM 2199 1.38 0.09121 1.154 1.65 3.528 0.0004186
6 rs9469084 32188361 T DOM 2201 0.591 0.1319 0.4564 0.7653 -3.988 6.672e-005
6 rs1123969 155276341 A DOM 2201 1.299 0.09375 1.081 1.561 2.789 0.005294
8 rs1016646 20307688 A DOM 2200 0.7924 0.1284 0.6161 1.019 -1.811 0.07009
10 rs10788275 124051491 T DOM 2200 1.376 0.09063 1.152 1.643 3.521 0.0004295
11 rs733454 76155369 A DOM 2200 1.335 0.1226 1.05 1.698 2.358 0.01838
20 rs2865873 38808983 T DOM 2182 1.28 0.09351 1.066 1.537 2.64 0.008289
ReplyDelete
Replies
Stephen TurnerJuly 31, 2012 at 10:44 AM
John Paul,

This is a known issue - when you skip over chromosomes you run into this problem. It will work fine on the first 9 rows of your data. I haven't had time to fix this. I'd welcome any contributions!

Best,

Stephen
ReplyDelete
Replies
AnonymousAugust 2, 2012 at 3:05 AM
Hello,

I am using the manhattan() function to plot the results of THREE different analyses for the same subset of 20 snps (ie 3 sets of p-values for the 20 snps). The function seems to have no problem plotting multiple SNPs at the same position. However, I would like to attribute a different color to each set of P-values. Would this be possible with the manhattan() function? I've tried including a fourth column ("COL") with the colors - ie

SNP CHR POS P COL
rs1 8 123 0.05 green
...

and calling it with

manhattan(results,col=results$COL)

but the colors do not come up.

Any suggestions?

Thank you.
ReplyDelete
Replies
Stephen TurnerAugust 2, 2012 at 6:31 AM
Try modifying line 63 of the code
ReplyDelete
Replies
AnonymousAugust 5, 2012 at 11:16 PM
Hello.

I think it is a good program to make manhattan in r. But i had a problem
When I made manhattan plot in cattle(chromosome = 29) using your manhattan function, It had a one problem. I can not see whole chromesome number in x-axis
(for example, x-axis: 1,2,3,4,5,6,7,---, 23,25,27,28,29)
How to change font size on x-axis about numeric value(here, chromosome number)

Thank you.
ReplyDelete
Replies
AnonymousAugust 17, 2012 at 11:17 AM
Stephen,

Thanks for the great code! How can I specify the size of the plot? I'd like to reduce the height and make it wider.
ReplyDelete
Replies
UnknownAugust 21, 2012 at 8:56 AM
I couldn't figure out how to run the QQ plot without the HLA region MB25-MB35 on chromosome 6. Any suggestions? Thanks!
ReplyDelete
Replies
AnonymousSeptember 26, 2012 at 8:55 AM
Dear Stephen,

On the QQ plot, I'm trying to adjust the point size, and the maximum on the x- and y-axes using:

> myQQ<-qq(mydata$P, pch=20, xmax=10, ymax=10)

However R doesn't like this.

1: In plot.window(...) : "xmax" is not a graphical parameter
2: In plot.window(...) : "ymax" is not a graphical parameter
Error in localWindow(xlim, ylim, log, asp, ...) :
formal argument "pch" matched by multiple actual arguments

I might be missing something fundamental to R as I haven't used it much (yet...!)

Cheers,

Blue

P.S. Thanks for posting the scripts and hope the grant application went well.
ReplyDelete
Replies
psiqueOctober 16, 2012 at 6:08 AM
Hi Stephen,

First of all: Thanks for the useful code!

I've used it several times before without problems but now qq(results) comes up with the "P value vector is not numeric" error. Worryingly, the problem persists even when I use your example dataset, whether it's using the R desktop interface or R in a unix environment.

manhattan(results) works fine, but gg.qq(results) also apppears broken, with the following error: "Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) : undefined columns selected"

Do you have any idea what this might be about?

Thanks,
Laura
ReplyDelete
Replies
Stephen TurnerOctober 16, 2012 at 6:17 AM
You actually need to provide the P-value vector. I.e., qq(results$P). Just tried this using my example dataset.

ReplyDelete
Replies
psiqueOctober 23, 2012 at 5:32 AM
D'oh. Sorry, I'm an idiot. Thanks..
ReplyDelete
Replies
MQNovember 7, 2012 at 11:22 PM
Guys THIS does not map for X, MT Y DNA
So, you can convert Cromossome X to a 23 , and y yo 24 and MT to 25

For that, on the manhattan function, put on the line after

d=data...

paste this:

d$CHR[d$CHR=="X"] <- "23"
d$CHR[d$CHR=="Y"] <- "24"
d$CHR[d$CHR=="MT"] <- "25"

Then all your problems will be solved.

Also , there is an erro i think in the line , lastbase=lastbase+tail(subset(d,CHR==CHRs[i-1])$BP, 1)

the i-1 is not correct, since it outputs erro in my data, you can put just
lastbase=lastbase+tail(subset(d,CHR==CHRs[i])$BP, 1) ?

Try out, see if this changes Works good!
ReplyDelete
Replies
Stephen TurnerNovember 8, 2012 at 4:52 AM
I believe I used the CHRs[i-1] bit to get the last base from the *previous*, i.e., i-1'th chromosome, so as to start numbering from there on the x-axis for the next chromsome.

Thanks for providing code for X, Y, and MT. I didn't want to make assumptions about how users were naming these chromosomes. I.e., some name them 23, 24, 25 as you did. Others have PAR regions from Y, not sure how they're naming those.
ReplyDelete
Replies
MQNovember 8, 2012 at 9:12 AM
To annotate the rs number for pvalues >0.001 if you can use this:

for (i in 1:length(d$P)){
if(d[i,"logp"]> -log10(0.001)){
text(d[i,"pos"],d[i,"logp"],d[i,"SNP"],pos=3)
}}

before the line " if (!is.null(annotate)) { "
ReplyDelete
Replies
MQNovember 8, 2012 at 9:16 AM
The dataframe does not accept X, Y, MT strings characters as a numerical, so its easy to see it wont work, and will give error, when plotting the value "X on the axis x". So thats the only aproach i see right now.

About the i-1 you probably right, but somehow, it gives me error.. i have to check again.

ReplyDelete
Replies
MQNovember 8, 2012 at 9:59 AM
About the last chromossome, the function is not very ... well contructed, because it assumes your data as the chromossomes from 1 to 23 all of them.
And what happens is if you have chromossome 7 and then only the 9..
the function as it is, will call chrom[i] where i is the 9 and then you make i-1 which is 8, then on the list there is no chromossome 8, so it gives error, thats the problem i found. So i have to rewritte the code, and now its working like i want.

I had to call a new list, with the unique chromossomes, then use that array as reference for the i , so then it will call the correct index-1 chromossome on the list of "chromossomes that exist only".

So this is the code:

for the whole manhattan function. Although the axis has some weird bars that dont get named, dont know if thats normal.

# manhattan plot using base graphics
manhattan <- function(dataframe, colors=c("#0174DF", "#DF013A"), ymax="max", limitchromosomes=1:23, suggestiveline=-log10(1e-5), genomewideline=-log10(5e-8), annotate=NULL, ...) {

d=dataframe

d$CHR[d$CHR=="X"] <- "23"
d$CHR[d$CHR=="Y"] <- "24"
d$CHR[d$CHR=="MT"] <- "25"
d$CHR <- sapply(d$CHR, as.numeric)

if (!("CHR" %in% names(d) & "BP" %in% names(d) & "P" %in% names(d))) stop("Make sure your data frame contains columns CHR, BP, and P")
if (any(limitchromosomes)) d=d[d$CHR %in% limitchromosomes, ]

d=subset(na.omit(d[order(d$CHR, d$BP), ]), (P>0 & P<=1)) # remove na's, sort, and keep only 0 -log10(0.001)){
text(d[i,"pos"],d[i,"logp"],paste(d[i,"SNP"],",",d[i,"Allele"]),pos=2,cex=0.6)
}
}

if (!is.null(annotate)) {
d.annotate=d[which(d$SNP %in% annotate), ]
with(d.annotate, points(pos, logp, col="green3", ...))
}

if (suggestiveline) abline(h=suggestiveline, col="blue")
if (genomewideline) abline(h=genomewideline, col="red")
}

ReplyDelete
Replies
MQNovember 8, 2012 at 10:05 AM
This comment has been removed by the author.
ReplyDelete
Replies
MQNovember 8, 2012 at 10:10 AM
Here is the google Drive Link to the code:

Somehow posting the whole code here, removes some parts (some protection from blogger perhaps).

Link: https://docs.google.com/document/pub?id=1YM8HXVZ_gBRdKe3Yp5aVcA6EI6z428adHgT8NXd7nDA
ReplyDelete
Replies
Stephen TurnerNovember 8, 2012 at 11:58 AM
MQ - thanks for updating this. FYI you can easily fork my code on GitHub, which will syntax-highlight if you name the Gist with a .R extension.
ReplyDelete
Replies
UnknownNovember 20, 2012 at 4:09 AM
Dear Stephen,
I start using R recently, and I need to plot some chromosome wise values in manhattan plot.
but my values are greater-than 1 (Max 11, Min 0). But I am confused how to change in the R script, you provided for manhattan plot. Can you please help me, I'll be very thankful to you.
ReplyDelete
Replies
AnonymousDecember 7, 2012 at 8:56 AM
Dear Stephen,
I really love your script, however I cannot get it to work. That is probably because I have five chromosomes called 2L, 2R, 3L, 3R and X. Is there a way to get around this?
Best, Palle
ReplyDelete
Replies
UnknownJanuary 2, 2013 at 7:41 AM
Hello,
I tried to run the function "manhattan" but I got this error:
Error: could not find function "manhattan".
Which package has this function?
I couldn't find it in the ggplot2.
Can anyone help?
Maybe it's because I am using a newer version of R?
Thanks,
Einat
ReplyDelete
Replies
AnonymousJanuary 6, 2013 at 12:10 PM
hello stephen. i have a (hopefully simple) question. can you explain how i can make sure my manhattan plot y-axis is always high enough to show the suggestive and genomewide lines?

thanks in advance, and happy new year!
ReplyDelete
Replies
UnknownJanuary 17, 2013 at 11:13 AM
I'm trying to plot a manhattan plot for 431885 SNPs. I 've already excluded chromossomes X, Y, MT and 0, and the following error appears:

Error in Math.factor(d$P) : log10 not meaningful for factors
In addition: Warning messages:
1: In Ops.factor(P, 0) : > not meaningful for factors
2: In Ops.factor(P, 1) : <= not meaningful for factors

What do I have to change on the original script? Is it possible to plot only with data from chromosomes and p-value?
Thank you.
ReplyDelete
Replies
AnonymousJanuary 20, 2013 at 9:03 AM
Hi Stephen,

I wonder if you can help.

The simple file below (which has lost its formatting here but was tab delimited) fails when producing a manhattan plot but remove the chr 21 line and all is well. I can't for the life of me work out why.

The error message I'm getting is Error in `$<-.data.frame`(`*tmp*`, "pos", value = numeric(0)) : and in fact changing the 21 to a 19 makes it all work again.

The code I was using is straight from your demo:
source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")
results <- read.table("file", header=T, sep="\t")
manhattan(results, suggestiveline=F, genomewideline=F, pch=20, main="Manhattan Plot")

Do you have any suggestions?

CHR SNP BP P
19 LbrM.19.embl_731631 731631 0.934
19 LbrM.19.embl_731734 731734 0.246
19 LbrM.19.embl_731758 731758 0.098
21 LbrM.21.embl_3445 3445 0.277
19 LbrM.19.embl_731898 731898 0.534
ReplyDelete
Replies
AnonymousJanuary 22, 2013 at 6:30 AM
Hi!
I can't run the function 'annotate' and I don't have a clue... Could you help me?
ReplyDelete
Replies
UnknownFebruary 6, 2013 at 4:11 AM
Hello,

Thank you very much for your great code.

I am having a problem, I am using this code for cattle dataset with 29 chromosomes. I have changed the limitchromosomes=1:29 but when I plot it, a gap appears between chromosomes 23 and 24, it would be great if you could help me with that.

Cheers!
ReplyDelete
Replies
AnonymousFebruary 7, 2013 at 5:32 PM
Thanks for sharing the code! Any advice on how to add a shaded region around the abline corresponding to 95% confidence intervals?
ReplyDelete
Replies
Stephen TurnerFebruary 7, 2013 at 6:42 PM
Does anyone here have the time or interest to help me go through all these bugs and feature requests and clean up the code, perhaps even create an R package?
ReplyDelete
Replies
AnonymousFebruary 12, 2013 at 1:49 PM
a question for more than 23 chromosomes plotting how I can do since the genomoma handling of cattle that is 30.

greetings.
ReplyDelete
Replies
AnonymousFebruary 19, 2013 at 6:27 PM
When I tried to open the output Rplots.pdf from the manhattan function, it said that the file is damaged and cannot be repaired. I thought it's because it's too big (600MB!) and tried to convert it to PNG, but it still tells me that the pdf is damaged. I tried to rerun it several times and also ran the qqplot, yet it never worked. Any insight?

Thanks!
ReplyDelete
Replies
UnknownMarch 4, 2013 at 1:04 PM
Hello, I would like to print my SNP names in the Axis X and not the BP. How can I do it?
Thanks!
Julie
ReplyDelete
Replies
AnonymousMarch 19, 2013 at 5:33 AM
Just want to say thanks, really helpful
ReplyDelete
Replies
AnonymousMarch 20, 2013 at 10:14 AM
I am having problems annotating with SNPs I want to highlight. My SNPs are in both files (all the ones to plot first and then the chosen SNPs to highlight in green) characters (not a string of characters). I get no error message when I make the manhattan function plot with my annotation data.frame but I also get no green dots.. Anyone know how to solve this? Thanks!
ReplyDelete
Replies
Astri HerlinoMarch 21, 2013 at 12:57 AM
Hi Stephan,

I'm newbie here. I just start learning R a month ago.
I was trying to make manhattan plot with 32 chromosomes and the last chromosomes is chromosome X. I tried to change your script to chromosome 1:32 and it gave an error like this. "Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :max not meaningful for factors."

I saw there was someone on August 15, 2011 at 5:08 AM also having the same problem with me, but I couldn't find the solution *or maybe I missed it*

Can you help me to solve this problem? Thanks!
ReplyDelete
Replies
AnonymousMarch 21, 2013 at 5:48 AM
I wrote yesteray about annotating, I have now fixed it by transposing the list of SNPs to highligt. AO. Herlino: I think you have to make sure that the column with CHR is numeric.
ReplyDelete
Replies
AnonymousMarch 21, 2013 at 11:27 AM
Hello Stephen,
great code! I have found it very helpful in my continuing quest to "get genetics done" I am slowly getting to the point where I have somewhat of an understanding of what the code is doing. I would like to create a plot in which the SNP surrounding a particular lead SNP are color coded by their LD with that SNP. My initial thought was to add an additional variable to the annotation file and then use that on line 70 in place of green3. Am I even on the right track here?

Thanks

Darryl
ReplyDelete
Replies
UnknownApril 1, 2013 at 7:12 PM
> snps_to_highlight <- scan("http://www.StephenTurner.us/snps.txt", character())

Error in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") : cannot open: HTTP status was '404 Not Found'

Is anyone else getting this error?
ReplyDelete
Replies
Janina JeffApril 5, 2013 at 12:49 PM
Hi Stephen,
Hope all is well! I am making several Q-Q plots and I am trying to compare different association methods that correct for population stratification. Anyway I would like the y-axis on the plots to be the same. Fro example, I have some plots that have p-values of 1.0E-24 and some with p-values of 1.0E-8. Is there a way to use ymax with the QQ command? I tried but it didn't work for me.
Thanks!
Janina
ReplyDelete
Replies
MaenApril 29, 2013 at 2:39 AM
Hi Stephen,

Thanks a lot for the code! I'm new to R and it made my life easier. One quick question please: Is it possible to plot a Manhattan plot with two p values for two phenotypes for the same SNP ? What do I need to change in the code ?

Thanks gain for the post !
Maen
ReplyDelete
Replies
Stephen TurnerMay 2, 2013 at 6:59 AM
Hmm.. hard to say without seeing your data. Are the columns still named "CHR", "SNP", "BP", and "P"?
ReplyDelete
Replies
UnknownMay 28, 2013 at 7:38 AM
Hi,
I am trying to use this code to make a Manhattan plot, but mine does not look the same as your example. Instead of having spread dots like I would expect when plotting the BP, I get just a single straight line as though I am plotting only the CHR number. My R skills are not good enought to understand the code so could someone please help me?
ReplyDelete
Replies
AthenaMay 29, 2013 at 4:10 AM
Hi Stephen,

Is there an easy way to make this plot with a list of SNPs to remove?

Thanks,

Charlotte
ReplyDelete
Replies
Gabor MeszarosJune 14, 2013 at 6:33 PM
Hi,
Thank you for the R functions!
On June 10, 2013 ther was a major update in the code, which caused probllems when trying to run it as described in the blogpost. Before it was running perfectly, but now I am getting messages like ""limitchromosomes" is not a graphical parameter", "formal argument "pch" matched by multiple actual arguments", "colors" not working any more.
Is it possible to get a pre-June,10 version again, e.g. as a separate file or function? I am using R 2.15.1, Windows computer
ReplyDelete
Replies
UnknownJune 20, 2013 at 4:15 PM
Hi stephen,Thank you very much for your codes.I am wondering if I can use your codes to plot a mahanttan plot for my results that are found from the analyses of R and ASRemel,to make it clear not from PLINk, I have tried to prepare the files in the same name of yours SNP,CHR,BP and P, but my SNPs are not in rs they are just in order of numbers(1,2,3.....).Could you please give me some suggestions on how I can use your codes.Thank you in advance.
ReplyDelete
Replies
UnknownJune 24, 2013 at 10:03 AM
Hi Stephen,
Its Janina again. I am plotting imputed data (~15 million SNPs) and I am getting the following error...
Error in Math.factor(d$P) : log10 not meaningful for factors
In addition: Warning messages:
1: In Ops.factor(P, 0) : > not meaningful for factors
2: In Ops.factor(P, 1) : <= not meaningful for factors

I have have checked the files and all of the P values fall between 0 and 1 but I think there is a spacing issue or extra characters present. Below are the first few rows that I have loaded into R.
Janina
CHR SNP BP A1 A2 FRQ INFO OR SE P
1 rs58108140 10583 G A 0.9352 0.3790 0.8756 0.3220 0.68
1 rs189107123 10611 C G 0.9774 0.4527 0.8466 0.4791 0.7281
1 rs180734498 13302 C T 0.8016 0.4482 0.9310 0.1793 0.69
1 rs144762171 13327 G C 0.9750 0.4409 1.7412 0.4941 0.2617
1 chr1:13957:D 13957 TC T 0.9726 0.4434 0.7246 0.4318 0.4557
1 rs151276478 13980 T C 0.9773 0.4896 1.2930 0.4670 0.5822
1 rs140337953 30923 G T 0.4590 0.4688 1.0767 0.1423 0.6035
1 chr1:47190:I 47190 G GA 0.9602 0.3153 2.1181 0.4668 0.1079
1 rs116400033 51479 T A 0.9381 0.4766 0.6665 0.2920 0.1647
ReplyDelete
Replies
UnknownJune 25, 2013 at 2:02 PM
Hi Stephen, It`s Robel again. I am still looking for your suggestions to my comment on June 20.Thank you very much for your support.

ReplyDelete
Replies
UnknownJuly 7, 2013 at 8:13 AM
Hi Stephen, I'm having the same trouble as Palle did in the post on December 7, 2012 at 8:56 AM. I have 5 chromosomes, 2L, 2R, 3L, 3R and X and I can't for the life of me work out how to avoid the errors I'm getting. Whats the best approach to be able to account for this and get the script working?

cheers,

Joe
ReplyDelete
Replies
UnknownJuly 17, 2013 at 6:37 PM
Hi Stephen,

Im using the qq function but I obtain the following error:

Error in seq.default(10000, length(e), 100) : wrong sign in 'by' argument

Any clues?

Cheers
ReplyDelete
Replies
Awais RasheedJuly 30, 2013 at 1:18 PM
Hi Stephen,
I wanted to change chromosome labels to "1A, 1B, 1D, 2A, 2B, 2D... etc from "1, 2, 3...."..

Any help please?

Regards
ReplyDelete
Replies
swvanderlaanAugust 8, 2013 at 4:31 AM
Hi,

I've read through all the comments. As I understand the old code (https://raw.github.com/stephenturner/qqman/master/qqman.r) has been updated, right?. There are some items we all would like to have added and some suggestions to that have been made:
-add a limit to the y-axis for QQ-plots - DONE
-add confidence intervals to the QQ-plots - DONE
-annotate with SNP-id and/or gene name/loci of genome-wide significant and/or target sites/loci/SNPs - MQ added a suggestion - NOT SURE THIS DONE?
-add special chromosomes (X,Y, MT) - MQ added a suggestion - NOT SURE THIS IS DONE?
-get the dots bigger above a certain threshold (-log10(p)=8 e.g.) - PENDING?

Additionally there are some bugs that have been fixed:
-fix issue with a missing chromosome

Stephen: if you need help, I can certainly try to, but I'm a newbie with R...

Best,

Sander
ReplyDelete
Replies
swvanderlaanAugust 14, 2013 at 7:14 AM
Soooo, I've been going over the comments again. It's not clear to me: does it or does it not handle X and Y and MT chromosomes? If so:
-how must it be coded in the data? As X or 23, Y or 24, MT or 25?
-does a number or a letter appear on the x-axis? So 23 or X, 24 or Y, 25 or MT?
-do I just plug into the command: manhattan (datafile, limitchromosomes=24,...) to get all chromosomes in the data?

Thanks!
P.S. Still willing to help on the code and the bugs and all the requests...
ReplyDelete
Replies
Stephen TurnerAugust 14, 2013 at 7:17 AM
My apologies swvanderlaan - I need to find time to get back to this and completely rewrite the post.
ReplyDelete
Replies
swvanderlaanAugust 14, 2013 at 8:11 AM
Also, keep on getting the same error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
max not meaningful for factors

What does that even mean? It's only with plotting Manhattans... I've got 2.56m rows and four columns (SNP CHR BP P) and 22 chromosomes (1-22 human). The lowest p-value ~ 5e-60.

Thanks

S.
ReplyDelete
Replies
StephenAugust 15, 2013 at 2:59 PM
Hey Stephen,
I am running this package on one of UVA's servers. When I run the final script to create the plot, I don't get anything. There is a file called "Rplots.pdf" that is in my local directory but I cannot open it (it's also 300mb). Is there a library that needs to be installed for these plots to be stored or do they go to the local directory by default? Have you had a problem opening these files when running from a server?
ReplyDelete
Replies
UnknownAugust 26, 2013 at 12:44 AM
Hi Stephen,

I always use qqman for Manhattan and QQ plots (source("http://people.virginia.edu/~sdt5z/0STABLE/qqman.r")). Now for the updated version, it looks that 'annotate' and 'colors' functions are not working. When I used my old code, SNPs selected were highlighted in black with rs numbers and also the following message showed:

warning message:
In plot.xy(xy.coords(x, y), type = type, ...) :
"colors" is not a graphical parameter

Thanks,

Jian
ReplyDelete
Replies
UnknownSeptember 11, 2013 at 9:09 PM
Hi Stephen,

This is so helpful - many thanks!

I am new to R so I am not sure how to trouble-shoot this issue. For some reason my Manhattan plot looks truncated. It looks as if it is not wide enough. The chromosome numbers appear stacked, not right next to each other. After chromosome 14 I don't see any more chromosome names. Do you know how I can change this?

Thanks,
Erin
ReplyDelete
Replies
UnknownSeptember 23, 2013 at 5:10 AM
In the manhattan function:

What's the best way of labelling the top SNPs with rs ID, p-value and gene name ?

What's the best way of labelling the horizontal threshold lines with the p-values that they represent ?
ReplyDelete
Replies
UnknownSeptember 30, 2013 at 9:44 PM
I am new to R and trying to make a manhattan plot and QQ plot following the example described here. I have understood most part of it, but I am not able to highlight SNPs listed in the snp.txt file. I did exactly as written in the example, but do not see green dots. Any help would be highly appreciated.
ReplyDelete
Replies
Mohamed FakhryOctober 2, 2013 at 9:36 AM
The map file in not integrated by default into the qassoc. So, how to run that code on qassoc files?
ReplyDelete
Replies
Stephen TurnerOctober 2, 2013 at 2:24 PM
You'll have to join the map information to the stats column manually (use an INNER JOIN in SQL speak). Or the merge() function in R should do it, if both are indexed by rs-number.
ReplyDelete
Replies
UnknownOctober 14, 2013 at 7:33 AM
Hi Stephen,

Thanks for the excellent code. I have a quick question: much like Tauqeer Alam, I am not seeing green dots on my Manhattan when feeding a list of SNPs into R. Rather, I'm getting an odd-looking column of text, as seen in the screenshot below:

http://imgur.com/eIQAIcI

I'm following your code to a tee and have tested this error on both a Mac and PC. Any suggestions?
ReplyDelete
Replies
ChaosOctober 22, 2013 at 3:00 AM
I also had this problem.

> head(subset(results, select=c(SNP, CHR, BP, P)))
SNP CHR BP P
1 BTA-28471-no-rs 0 0 NA
2 BTA-28495-no-rs 0 0 NA
3 BTA-28466-no-rs 0 0 NA
4 ARS-USMARC-Parent-DQ650635-rs29012174 0 0 NA
5 ARS-USMARC-Parent-DQ451555-rs29010795 0 0 NA
6 BPI-1 0 0 NA
> manhattan(results)
Error in `$<-.data.frame`(`*tmp*`, "pos", value = NA) :
replacement has 1 row, data has 0

For the P values, they are all NA. The assignment is due tomorrow morning and I've been banging my head over this all day.
ReplyDelete
Replies
SriOctober 31, 2013 at 2:57 PM
Thank you Dr. Turner, this script is great. I was having one issue I was hoping to get help with. I am trying to generate the plot with different colored points:
manhattan(results, colors=c("black","grey50","orangered1"), pch=20, genomewideline=F, suggestiveline=F)

but get the following warning:
"colors" is not a graphical parameter

I was wondering where the script could be modified to account for this?

Thanks!
Sri
ReplyDelete
Replies
Stephen TurnerOctober 31, 2013 at 3:19 PM
Use pt.col to change the colors. I need to update the tutorial, many apologies.
ReplyDelete
Replies
MollyNovember 7, 2013 at 10:10 AM
I've had great luck using this code in the past, but am having some issues with the ticks/labels of the x axis with the newest code. I only get a single tick mark/label right at the beginning (which is equal to the value of ticks=floor(length(d$pos))/2+1). I feel like there's a rep argument of sorts missing in there. I've tried manipulating things, but am not making progress. Any suggestions?
ReplyDelete
Replies
swvanderlaanDecember 5, 2013 at 3:46 AM
Hi,
I like your confidence interval plotting on the QQ-plot. I want to add this to our qq-plot function in MANTEL (meta-analysis of GWAS package made by De Bakker Lab).
Just wondering: what does the "xspace = 0.078" function/line actually do?

Thanks and looking forward to the tweaks of this great script, Steven! :-)

Best,

Sander
ReplyDelete
Replies
UnknownFebruary 21, 2014 at 3:31 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownMarch 13, 2014 at 12:50 AM
Hi, Stephen

I have done GWAS analysis with different models (PCA, PCA+K, Q, Q+K). So, I have 4 sets of p-values for the same trait. How can I plot a QQ plot for 4 sets of p-values together?
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

This blog has moved!

Monday, April 25, 2011

Annotated Manhattan plots and QQ plots for GWAS using R, Revisited

203 comments: