Wednesday, February 10, 2010

LocusZoom: Plot regional association results from GWAS

Update Friday, May 14, 2010: See this newer post on LocusZoom.



If you caught Cristen Willer's seminar here a few weeks ago you saw several beautiful figures in the style of a manhattan plot, but zoomed in around a region of interest, with several other useful information overlays.

Click to take a look at this plot below, showing the APOE region for an Alzheimer's Disease GWAS:

It's a simple plot of the -log10(p-values) for SNPs in a given region, but it also shows:

1. LD information (based on HapMap) shown by color-coded points (not much LD here).
2. Recombination rates (the blue line running through the plot). Peaks are hotspots.
3. Spatial orientation of the SNPs you plotted (running across the top)
3. Genes! The overlay along the bottom shows UCSC genes in the region.

You can very easily take a PLINK output file (or any other format) and make an image like this for your data for any SNP, gene, or region of interest using a tool Cristen and others at Michigan developed called LocusZoom.  LocusZoom is written in R with a Python wrapper that works from an easy to use web interface.

All the program needs is a list of SNP names and their associated P-values. If you're using PLINK, your *.assoc or *.qassoc files have this information, but first you'll have to run a quick command to format them. Run this command I discussed in a previous post to convert your PLINK output into a comma delimited CSV file (PLINK's default is irregular whitespace delimited):

cat plink.assoc | sed -r 's/^\s+//g' | sed -r 's/\s+/,/g' > plink.assoc.csv

Next, you'll want to compress this file so that it doesn't take forever to upload.

gzip plink.assoc.csv

Now, upload your new file (plink.assoc.csv.gz) on the LocusZoom website.  Tell it that your p-value column is named "P" and your marker column is named "SNP" (or whatever they're called if you're not using PLINK). Change the delimiter type to "comma", then put in a region of interest. I chose APOE, but you could also use a SNP name (include the "rs" before the number). Now hit "Plot your Data," and it should take about a minute.

There are some other options below, but I've had bad luck using any of them. For instance, I can never get it to output a PNG properly - only PDF works the last time I tried it. I also could not successfully make a plot if I turn off the recombination rate overlay. I know this is a very early version, but hopefully they'll clean up some of the code and document some of its features very soon. I could see this being a very useful tool, especially once it's available for download for local use. (Update: some of these bugs have been fixed. See this newer post on LocusZoom).

LocusZoom: Plot regional association results from GWAS

3 comments:

  1. You might also want to check out SNAP from Broad Institute. It includes similar type of regional association plot function (even though it does not allow uploading full result sets, only the selected region). SNAP also allows you to do LD-plots, find tag SNPs and to filter SNPs based on SNP microarray types. http://www.broadinstitute.org/mpg/snap/

    ReplyDelete
  2. Thanks for the publicity! Vanderbilt was actually the first place to hear about LocusZoom and we had to really scramble to get it ready in time. Consequently, a few bugs slipped through. Sorry! We have been working hard to clean them up and have addressed all of the ones we know about. If you find more, please let us know and we'll fix it right away (cristen@umich.edu).

    From what I understand, our plots have a few advantages over the Broad plots
    i) the ability to use batch mode to generate dozens of plots at once
    ii) ability to plot 1000Genomes SNPs (or any SNPs in chr9:10056770 hg18 format)
    iii) annotate SNPs into functional categories including 1000G SNPs
    iv) automated spacing of gene names to avoid overlap

    Thanks again for the publicity and please let us know if you find any bugs or have requests for new features. Documentation is now up and the batch mode is ready for action!

    Cristen Willer
    cristen@umich.edu

    ReplyDelete
  3. Really that's nice...Vanderbilt was actually the first place to hear about LocusZoom and we had to really scramble to get it ready in time. Consequently, a few bugs slipped through. Sorry! We have been working hard to clean them up and have addressed all of the ones we know about. If you find more, please let us know and we'll fix it right away...

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.