Wednesday, March 31, 2010

Wellcome Trust: Common CNVs Unlikely to Influence Common Disease

A new paper in Nature by the Wellcome Trust Case Control Consortium examining 16,000 cases of 8 common diseases and 3000 shared controls finds that common CNVs probed on existing arrays are well tagged by SNPs and are unlikely to contribute much to common human disease. In regards to where the missing heritability may lie, Peter Donnelly was quoted in the Times Online as saying "my position now is to be very skeptical about the role of common CNVs...we have shown it wasn't Colonel Mustard in the ballroom with the candlestick. It narrows down the search for what is responsible."

Nature: Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls

Abstract: Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed ~19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated ~50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease—IRGM for Crohn’s disease, HLA for Crohn’s disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes—although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.

Tuesday, March 30, 2010

Federal Courts Invalidate Myriad's Breast Cancer Gene Patents

A District Court handed down a summary judgment invalidating most of Myriad's claims to both the BRCA1 DNA sequence and the method of testing for early-onset familial breast and ovarian cancer. See Genetic Future and Genomics Law Report for analysis.

Wednesday, March 24, 2010

Vanderbilt-Ingram Cancer Center Retreat

VICC's retreat this year is Tuesday, May 11, 2010. The theme of the retreat looks interesting: "Genomic Approaches for Personalized Medicine." You can register free here.

VICC retreat: Genomic Approaches for Personalized Medicine

Tuesday, March 23, 2010

Video: ggplot2 Creator Hadley Wickham's Short Course on Data Visualization Using R

Hadley Wickham, creator of ggplot2, has posted a 2 hour video on data visualization using R. You can find links to the videos and slides over at Revolutions Blog.

Check back here soon. I am working with Hadley to arrange a day-long ggplot2 short course here at Vanderbilt this summer. I'll post the date and registration info once everything is set up.

Video: Hadley Wickham gives a short course on graphics with R (via Revolutions)

Thursday, March 18, 2010

Create annotated GWAS manhattan plots using ggplot2 in R

*** Update April 25, 2011: This code has gone through a major revision. Please see the updated code and tutorial here. ***

 A few months ago I showed you in this post how to use some code I wrote to produce manhattan plots in R using ggplot2. The qqman() function I described in the previous post actually calls another function, manhattan(), which has a few options you can set. I recently had to update this function to allow me to color code SNPs of interest, similar to the plots shown in figure 1 of Cristen Willer's 2008 Nature Genetics paper on lipids. I'll try to explain how to utilize that feature here.

The only extra thing you'll need here is a list of SNPs that you want to highlight. The only thing - that list of SNPs can't have the "rs" prefix on the rs numbers. They must be integers. E.g. if you want to highlight rs1234 and rs5678, you would create an array containing the integers 1234 and 5678. If you already have a list of SNPs, use the substr() command to perform a substring operation to extract only the digits from the rs numbers.

Once you load in your PLINK results and your array containing the rs numbers you want to highlight, simply call the manhattan() function with the option annotate=T, and SNPlist=x, where x is the name of the vector containing rs numbers.

Here's some example code:

# This requires ggplot2

# First, load these functions from source:

# Next, load your PLINK results file to a data frame:
mydata=read.table("plink.qassoc", header=TRUE)

# Assuming you already have a vector of rs numbers to highlight
[1] 3821815 1851665 1621816 1403694 1656922  166479

# Call the manhattan function, with annotate=T.
# The SNPlist argument takes the list of SNPs to highlight.
# Save the plot to an object

# Finally, save the plot in the current directory using ggsave()

If all goes well, you should have a manhattan plot with SNPs of interest highlighted. It might look something like this:

A few tips: You can use the UCSC genome browser to look up coordinates for genes, then select rs numbers based on that range, if you want to highlight certain genes. The default color is green but you can change this on line 118 of the code at the URL above.

**** UPDATE, May 15 2014 *****
The functions described here have now been wrapped into an R package. View the updated blog post or see the online package vignette for how to install and use. If you'd still like to use the old code described here, you can access this at version 0.0.0 on GitHub. The code below likely won't work.

Francis Collins: Computational biologists are "breakthrough artists"

Just caught this on the OpenHelix Blog. In an interview with Charlie Rose, NIH director Francis Collins said Computational biologists will be the "breakthrough" artists of the future.

CHARLIE ROSE: You have said if you were starting over you would be a computational biologists.

FRANCIS COLLINS: I did say that. I still say that. Computational biologists are having a really good time and it’s going to get better.

CHARLIE ROSE: Their day is coming?

FRANCIS COLLINS: Their day is here, but it’s going to be even more here in a few years. So what do they do? They are people who are jointly trained in studying biology in all of its complexes, but they’re also very capable at computation analysis of huge data sets, because — in part because of NIH and the ethic that was adopted by the genome project, huge amounts of data are being made publicly accessible everyday about all kinds of disease questions.

CHARLIE ROSE: So they’re going to be the break through artists of the future?

FRANCIS COLLINS: They’re going to be the breakthrough artists...

Tuesday, March 16, 2010

$25 Plate Centrifuge

While reading through an article on job hunting success on Bitesize Bio I stumbled upon another piece there that's definitely in the spirit of "getting things done" in genetics research.

I had always halfway considered going into business manufacturing lab supplies. Take a $10 Easy-Bake Oven and add a little bit tighter temperature regulation, call it a hybridization oven, and sell it for thousands. Now it wasn't long ago that I remember doing some TaqMan genotyping before GWAS was all the rage, and how awful the results would be when I would forget to spin down the plates before starting the PCR. I haven't a clue how many tens of thousands of dollars a real plate centrifuge and rotors would set you back, but check out the post on Bite Size Bio below, where a few resourceful folks show you how to make a plate centrifuge from a salad spinner in 5 minutes for $25.

Bitesize Bio: How to Build a Plate Centrifuge for $25

Monday, March 15, 2010

Seminar: Pathway-based analysis for genome-wide association studies

Vanderbilt Epidemiology Center, Institute for Medicine and Public Health presents:

"Pathway-based analysis for genome-wide association studies"

Steven Chen Ph.D
Assistant Professor of Biostatistics

Tuesday, March 16, 2010
9:00 AM - 10:00 AM
2525 West End Avenue 6th Floor Boardroom

Tuesday, March 9, 2010

Papers from March 8, 2010 Journal Club

Here are the papers we talked about in yesterday's Journal Club:

Genome Biol. 2009; 10(11): R134
Searching for SNPs with cloud computing.
Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL.

Table of Contents, Nature Methods, Visualization Supplement.

Am J Hum Genet. 2010 Feb 12; 86(2): 113-25
Functional gene group analysis reveals a role of synaptic heterotrimeric G proteins in cognitive ability.
Ruano D, Abecasis GR, Glaser B, Lips ES, Cornelisse LN, de Jong AP, Evans DM, Smith DG, Timpson NJ, SMit AB, Heutink P, Verhage M, Posthuma D.

Kathy Giacomini: Personalizing Anti-diabetic Drug Therapy

This Thursday's discovery lecture looks interesting. In case you missed last week's Nobel laureate, you can watch the recording from Cech's talk or any of the previous lectures in the discovery series here.

Kathy Giacomini
Professor and Co-Chair Department of Bioengineering and Therapeutic Sciences, Schools of Pharmacy and Medicine, University of California, San Francisco

March 11, 2010
"Personalizing Anti-Diabetic Drug Therapy"

Sponsor: Division of Clinical Pharmacology and Department of Pharmacology

208 Light Hall / 4:00 p.m. (CST)

Monday, March 8, 2010

Nature Methods: Visualization

Check out this month's table of contents in  Nature Methods. It contains a series of five commissioned Reviews discuss the challenges of visualizing biological data and the visualization tools available to biologists working with genomes, alignments and phylogenies, macromolecular structures, images and systems biology data.



Supplement on visualizing biological data pS1

Daniel Evanko


Visualizing biological data—now and in the future ppS2 - S4

Seán I O'Donoghue, Anne-Claude Gavin, Nils Gehlenborg, David S Goodsell, Jean-Karim Hériché, Cydney B Nielsen, Chris North, Arthur J Olson, James B Procter, David W Shattuck, Thomas Walter & Bang Wong
Methods and tools for visualizing biological data have improved considerably over the last decades, but they are still inadequate for some high-throughput data sets. For most users, a key challenge is to benefit from the deluge of data without being overwhelmed by it. This challenge is still largely unfulfilled and will require the development of truly integrated and highly useable tools.


Visualizing genomes: techniques and challenges ppS5 - S15

Cydney B Nielsen, Michael Cantor, Inna Dubchak, David Gordon & Ting Wang


Visualization of multiple alignments, phylogenies and gene family evolution ppS16 - S25

James B Procter, Julie Thompson, Ivica Letunic, Chris Creevey, Fabrice Jossinet & Geoffrey J Barton

Visualization of image data from cells to organisms ppS26 - S41

Thomas Walter, David W Shattuck, Richard Baldock, Mark E Bastin, Anne E Carpenter, Suzanne Duce, Jan Ellenberg, Adam Fraser, Nicholas Hamilton, Steve Pieper, Mark A Ragan, Jurgen E Schneider, Pavel Tomancak & Jean-Karim Hériché

Visualization of macromolecular structures ppS42 - S55

Seán I O'Donoghue, David S Goodsell, Achilleas S Frangakis, Fabrice Jossinet, Roman A Laskowski, Michael Nilges, Helen R Saibil, Andrea Schafferhans, Rebecca C Wade, Eric Westhof & Arthur J Olson

Visualization of omics data for systems biology ppS56 - S68

Nils Gehlenborg, Seán I O'Donoghue, Nitin S Baliga, Alexander Goesmann, Matthew A Hibbs, Hiroaki Kitano, Oliver Kohlbacher, Heiko Neuweger, Reinhard Schneider, Dan Tenenbaum & Anne-Claude Gavin

Searching for SNPs with cloud computing

Suppose you have billions of reads from a hot new sequencing machine and you want to simultaneously align these reads and call SNPs very quickly on the cheap. Check out an open source tool called Crossbow and the recent paper in Genome Biology.  Crossbow is a Hadoop-based software tool that combines the speed of the short read aligner Bowtie with the accuracy of the SNP caller SOAPsnp to perform alignment and SNP calling for multiple human whole-genome datasets per day. In the demonstration in the paper, the authors aligned and called SNPs from 2.7 billion short reads from a Han Chinese male with 98% concordance to the calls from an Illumina genotyping chip. The whole process took 3 hours on a 320-core parallel computing cluster rented from the Amazon Elastic Compute Cloud (EC2) for a total cost of $85. Since everything is open-source, there should be nothing stopping you from downloading all the necessary software and running it on your own cluster if you have access to one.

Crossbow: Genotyping from short reads using cloud computing

Wednesday, March 3, 2010

Arrange multiple ggplot2 plots in the same image window

In a previous tutorial I showed you how to create plots faceted by the level of a third variable using ggplot2. A commenter asked about using faceted plots and viewports and reminded me of this function I found in the ggplot2 Google group. The arrange function below is similar to using par(mfrow=c(r,c)) in base graphics to put more than one plot in the same image window.

The basic idea is that you assign ggplot2 plots to an object, and then use the arrange function to display two or more. Here's an example. First copy and paste the code above (or put in your Rprofile). Next install and/or load ggplot2 as described in a previous ggplot2 tutorial.

# Load the diamonds dataset

# Create a histogram, assign to "plot1"
plot1 <- qplot(price,data=diamonds,binwidth=1000)

# Create a scatterplot
plot2 <- qplot(carat,price,data=diamonds)

# Arrange and display the plots into a 2x1 grid
And here's what you should get:


Tuesday, March 2, 2010

Wiley Essential Biochemistry Online

I joined the Ritchie Lab back in 2007, and even though it's only been three years away from the bench, I've forgotten much of what I learned back in biochem classes.  I'm giving a talk on lipid genetics next week, and I found the Wiley Essential Biochemistry website very helpful for brushing up on some basic lipoprotein biology. There are 27 chapters covering a broad range of topics from enzyme kinetics to phosphofructokinase regulation. Many of them have short animations and optional exercises to test your knowledge. It's a great resource for brushing up on some fundamental biochemical concepts when you need to.

Monday, March 1, 2010

Seminar: GWAS, Lipid Genetics, and EMR-Linked Biobanks

Time for a little shameless self-promotion. I'll be giving a talk in genetics interest group next week.

"Using GWAS in an EMR-linked biobank to explore genetic and environmental determinants of HDL cholesterol"

Thursday, March 11, 2010
206 PRB
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.