Thursday, May 6, 2010

Mixed linear model approach adapted for genome-wide association studies

A few weeks ago I covered an R package for efficient mixed model regression that is capable of simultaneously accounting for both population stratification and relatedness to compute unbiased estimates of standard errors and p-values for genetic association studies. Fitting linear mixed effects models on GWAS scale can be very time consiuming, however, and another group recently reported a method that fits a mixed linear model very efficiently by clustering individuals into groups and eliminating the need to recompute variance components. They showed that using their modifications, they were able to reduce computation time by more than 800-fold over SAS proc mixed / SAS proc cluster. Check out the paper for more details.

Nature Genetics: Mixed linear model approach adapted for genome-wide association studies

Abstract: Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.

2 comments:

  1. Thank you for bringing the work of my colleague Chao Lai to the attention of your readers. He has thought long and hard on how to deal with family relations in population genetics data and was the driving force behind the work you cite above.

    We intend to fully implement this analysis method when we look at our GWAS data for the GOLDN (fenofibrate intervention) study.

    ReplyDelete
  2. I really love those articles that they can make your work easier and they don't mess out your statistic calculations.

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.