Tuesday, April 13, 2010

Efficient Mixed-Model Association in GWAS using R

I recently did an analysis for the eMERGE network where I had lots of individuals from a small town in central Wisconsin where many of the subjects were related to one another. The subjects could not be treated as independent, but I could not use a family-based design either. I ended up using a mixed model approach using previously mentioned GenABEL. You can read about the method here (PubMed).

While researching which methods to use, I ran into what could be a potential problem. All of the methods that examine relatedness (including the method mentioned above), assume you have an ethnically homogeneous population. Yet all of the methods which look for population stratification (Eigenstrat, Structure, etc) assume samples are unrelated. So what do you do if you have both population stratification AND a high level of relatedness among your samples?

A few weeks ago our graduate student association invited and hosted Dr. Elaine Ostrander here from the NIH to talk about her work with gene mapping in dogs. She mentioned a method she used called Efficient Mixed-Model Association (EMMA) for performing association mapping while simultaneously correcting for relatedness and population structure. Using multiple highly inbred dog breeds represents the extreme case of simultaneously having to deal with substructure, inbreeding, and relatedness. If this method works for association mapping combining several purebred dog breeds, it should work for a less problematic human dataset as well.

EMMA is also implemented in R. You can download the necessary R package from the project's website below.

Efficient Mixed-Model Association (EMMA) website

PubMed: Efficient control of population structure in model organism association mapping.

Abstract: Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.