Wednesday, September 29, 2010

Vanderbilt Genetics Symposium: Beyond Disease Dichotomy - Quantitative Traits and Intermediate Phenotypes

About a year ago I reiterated a point made nicely in a Nature Reviews Genetics article, that there is no such thing as a common disorder - only extremes of quantitative traits. Such is the theme of this year's Annual Vanderbilt Genetics Symposium, "Beyond Disease Dichotomy - Quantitative Traits and Intermediate Phenotypes." This is a day-long event held at the Vanderbilt Student Life Center on Wednesday October 13, 8am-4pm. Registration is free but required to attend. Students in our program will be presenting posters, and students in other programs are welcome to submit an abstract as well.  You can check out the full agenda at the link below. Here is the speaker lineup:

Keynote Speakers

Molly Losh, Ph.D.Jane and Michael Hoffman Assistant Professor of
Communication Sciences & Disorders
Northwestern University

Charles R. Farber, Ph.D.Assistant Professor of Medicine
University of Virginia

Andrew J. Saykin, PsyD, ABCNRaymond C. Beeler Professor of Radiology and Imaging Sciences
Professor of Medical and Molecular Genetics
Director, Center for Neuroimaging
Indiana University School of Medicine

Vanderbilt Speakers

Roger Cone, Ph.D.Professor and Chairman, Department of Molecular
Physiology & Biophysics

Dana Crawford, Ph.D.Assistant Professor, Department of Molecular
Physiology & Biophysics
Investigator, Center for Human Genetics Research

Karoly Mirnics, Ph.D.Professor and Vice Chair for Basic Research,
Department of Psychiatry

Vanderbilt Genetics Symposium: Beyond Disease Dichotomy - Quantitative Traits and Intermediate Phenotypes

Monday, September 27, 2010

Towards a More Rigorous Approach to Personalized Medicine

Frank Harrell, chair of our Biostats department, will be giving a seminar entitled "Towards a More Rigorous Approach to Personalized Medicine." As a champion of methods and strategies for reproducible research, Dr. Harrell's lecture on personalized medicine should be interesting.

Frank E Harrell Jr, Professor and Chair, Department of Biostatistics

Wednesday, 29 Sep 10, 1:30-2:30pm, MRBIII Conference Room 1220

Intended Audience: Persons interested in personalized medicine, biomarkers, reproducible research, clinical epidemiology


There are many ways to personalize the diagnosis and treatment of diseases, pharmacogenomics being one of them. Personalization can be based on routinely collected information, molecular signatures, or on repeated trials on the patient whose treatment plan is being devised. However, current emphases in personalized medicine research often ignore characteristics known to impact treatment benefit, in favor of tests that either generate more revenue or are developed with research that is perhaps easier to fund than "low-tech" research. Failure of the research community to fully utilize rich datasets generated by randomized clinical trials only hightens this concern.

Research supporting personalized medicine can be made more rigorous and relevant. For example in acute diseases, multi-period crossover studies can be used to measure individual response to therapy, and these studies can provide an upper bound on the genome by treatment interaction. When patient by treatment interaction is demonstrated, crossover studies can form an ideal basis for pharmacogenomics. However, even with the best within-patient data, group average treatment effects need to be incorporated in order for predictions for individual patients to have high precision.

There are a few ways to do personalized medicine well but a multitude of ways to do it poorly. Biomarker research in particular has not fulfilled its early promises, a major reason being flawed methodology. The flaws include faulty experimental design, bias, overfitting, weak validation, irreproducible research, data processing and analysis practices, and failure to rigorously show that the new markers add information to readily available clinical data. This will be discussed in terms of Platt's concept of "strong inference", seeking alternative explanations of findings, and sensitivity analysis.

This talk is also a call for the biostatistics and clinical epidemiology communities to be more integrally involved in research related to personalized medicine.

Tuesday, September 21, 2010

Install and load R package "Rcmdr" to quickly install lots of other packages

I recently reformatted my laptop and needed to reinstall R and all the packages that I regularly use. In a previous post I covered R Commander, a nice GUI for R that includes a decent data editor and menus for graphics and basic statistical analysis. Since Rcmdr depends on many other packages, installing and loading Rcmdr like this...

install.packages("Rcmdr", dependencies=TRUE)

...will also install and load nearly every other package you've ever needed to use (except ggplot2, Hmisc, and rms/design). This saved me a lot of time trying to remember which packages I normally use and installing them one at a time. Specifically, installing and loading Rcmdr will install the following packages from CRAN: fBasics, bitops, ellipse, mix, tweedie, gtools, gdata, caTools, Ecdat, scatterplot3d, ape, flexmix, gee, mclust, rmeta, statmod, cubature, kinship, gam, MCMCpack, tripack, akima, logspline, gplots, maxLik, miscTools, VGAM, sem, mlbench, randomForest, SparseM, kernlab, HSAUR, Formula, ineq, mlogit, np, plm, pscl, quantreg, ROCR, sampleSelection, systemfit, truncreg, urca, oz, fUtilities, fEcofin, RUnit, quadprog, mlmRev, MEMSS, coda, party, ipred, modeltools, e1071, vcd, AER, chron, DAAG, fCalendar, fSeries, fts, its, timeDate, timeSeries, tis, tseries, xts, foreach, DBI, RSQLite, mvtnorm, lme4, robustbase, mboost, coin, xtable, sandwich, zoo, strucchange, dynlm, biglm, rgl, relimp, multcomp, lmtest, leaps, effects, aplpack, abind, RODBC.

Anyone else have a solution for batch-installing packages you use on a new machine or fresh R installation? Leave it in the comments!

Monday, September 13, 2010

Empowering Personal Genomics by Considering Regulatory Cis-Epistasis and Heterogeneity

Will Bush and I just heard that our paper "Multivariate Analysis of Regulatory SNPs: Empowering Personal Genomics by Considering Cis-Epistasis and Heterogeneity" was accepted for publication and a talk at the Personal Genomics session of the 2011 Pacific Symposium in Biocomputing.

Your humble GGD contributors embarked on our first collaborative paper using genome-wide transcriptome data and genome-wide SNP data from HapMap lymphoblastoid cell lines to examine an alternative mechanism for how epistasis might affect human traits. Many human traits are driven by alterations in gene expression, and it's known that common genetic variation affects the expression of nearby genes. We also know that epistasis is ubiquitous and affects human traits. Combining these three ideas, is it possible that genetic variation can interact epistatically to exert a cis-regulatory effect on the expression of nearby genes? If so, what is the genomic and statistical structure of these epistatically interacting multilocus models? Are genes which are affected by cis-epistasis associated with complex human disease or morphological phenotypes? If so, how might we use this knowledge to guide the reanalysis of existing datasets? We addressed these questions here using experimental data from HapMap cell lines. If you're interested in seeing the paper please email me, or try to catch our talk at PSB (a meeting worth going to!).

Abstract: Understanding how genetic variants impact the regulation and expression of genes is important for forging mechanistic links between variants and phenotypes in personal genomics studies.  In this work, we investigate statistical interactions among variants that alter gene expression and identify 79 genes showing highly significant interaction effects consistent with genetic heterogeneity.  Of the 79 genes, 28 have been linked to phenotypes through previous genomic studies.  We characterize the structural and statistical nature of these 79 cis-epistasis models, and show that interacting regulatory SNPs often lie far apart from each other and can be quite distant from the gene they regulate.  By using cis-epistasis models that account for more variance in gene expression, investigators may improve the power and replicability of their genomics studies, and more accurately estimate an individual's gene expression level, improving phenotype prediction.

Pacific Symposium in Biocomputing 2011

Tuesday, September 7, 2010

Embed R Code with Syntax Highlighting on your Blog

Note 2010-11-17: there's more than one way to do this. See the updated post from 2010-11-17.

If you use blogger or even wordpress you've probably found that it's complicated to post code snippets with spacing preserved and syntax highlighting (especially for R code). I've discovered a few workarounds that involve hacking the blogger HTML template and linking to someone else's javascript templates, but it isn't pretty and I'm relying on someone else to perpetually host and maintain the necessary javascript. Github Gists make this really easy. Github is a source code hosting and collaborative/social coding website, and makes it very easy to post, share, and embed code snippets with syntax highlighting for almost any language you can think of.

Here's an example of some R code I posted a few weeks ago on making QQ plots of p-values using R base graphics.

The Perl highlighter also works well. Here's some code I posted recently to help clean up PLINK output:

Simply head over to and paste in your code, select a language for syntax highlighting, and hit "Create Public Gist." The embed button will give you a line of HTML that you can paste into your blog to embed the code directly.

Finally, if you're using Wordpress you can get the Github Gist plugin for Wordpress to get things done even faster. A big tip of the had to economist J.D. Long (blogger at Cerebral Mastication) for pointing this out to me.

Thursday, September 2, 2010

Rebecca Skloot (HeLa) to speak at Vanderbilt September 7

Rebecca Skloot, author of bestselling The Immortal Life of Henrietta Lacks (Amazon, $14), will be speaking here at Vanderbilt next Tuesday at noon in 208 Light Hall. This is one you don't want to miss. Be sure to get there a few minutes early. When 208 fills up they'll have overflow in 202 with a live webcast. RSVP to for a free lunch. On a related note, apparently Oprah and Alan Ball (screenwriter, Six Feet Under, True Blood) will be teaming up with HBO to produce a movie based on the book. No doubt this will stir up a much needed dialog about the nature of informed consent in scientific research. If you don't know the story about the origin of HeLa cells, you can get the quick summary on Wikipedia. Better yet, buy the book.
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.