Friday, October 29, 2010

Reproducible Research in the Omics Era: A Presentation and Panel Discussion

Seminar announcement for Vanderbilt folks:


Vanderbilt-Ingram Cancer Center
Quantitative Sciences Seminar Series

Presents

Reproducible Research in the Omics Era:
A Presentation and Panel Discussion
Kevin R. Coombes, PhD
Deputy Chair, Bioinformatics, and Professor of Bioinformatics and
Computational Biology
M.D. Anderson Cancer Center

and

Keith Baggerly, PhD
Associate Professor, Dept. of Bioinformatics and Computational Biology
M.D. Anderson Cancer Center


Panel Discussion at 1 p.m., following presentations:
Featuring Drs. Baggerly and Coombes, along with
Vanderbilt University School of Medicine’s
Dr. William Pao, Dr. Frank Harrell, and Dr. Yu Shyr

Friday, November 19, 2010
12 noon – 2 PM
214 Light Hall


Thursday, October 28, 2010

PacBio Film, Discussion & Reception/Dinner at ASHG 2010

Pacific Biosciences is hosting a reception and dinner, and is screening their film The New Biology at this year's ASHG meeting. According to a flyer the mailed me, the film will showcase their SMRT sequencing technology and how it can be used to "create predictive models of living systems and gain wisdom about the fundamental nature of life itself." While the last bit is perhaps an overstatement, the event should nonetheless be an event worth attending. The event includes a reception, dinner, and a moderated discussion featuring individuals from the film. Unfortunately this conflicts with the previously mentioned 1000 Genomes Tutorial, but if you get waitlisted at the tutorial, sign up for this event at the link below!

Date
Wednesday, November 3 2010

Time
7-10pm

Location
Smithsonian National Air and Space Museum
Independence Ave at 6th St SW
Washington, DC 20560

RSVP here - pacificbiosciences.com/newbiology

Wednesday, October 27, 2010

Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application

While writing my thesis I came across this nice review by Rita Cantor, Kenneth Lange, and Janet Sinsheimer at UCLA, "Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application." Skip the introduction unless you're new to GWAS, in which case you'll probably want to start with this more recent review by Teri Manolio. After skipping the intro you'll find succinct introduction to meta-analysis for GWAS with lots of very good references, including these among others:

DerSimonian R., Laird N. Meta-analysis in clinical trials. Control. Clin. Trials. 1986;7:177–188. [PubMed]

Fleiss J.L. The statistical basis of meta-analysis. Stat. Methods Med. Res. 1993;2:121–145. [PubMed]

Yesupriya A., Yu W., Clyne M., Gwinn M., Khoury M.J. The continued need to synthesize the results of genetic associations across multiple studies. Genet. Med. 2008;10:633–635. [PubMed]

Lau J., Ioannidis J.P., Schmid C.H. Quantitative synthesis in systematic reviews. Ann. Intern. Med. 1997;127:820–826. [PubMed]

Allison D.B., Schork N.J. Selected methodological issues in meiotic mapping of obesity genes in humans: Issues of power and efficiency. Behav. Genet. 1997;27:401–421. [PubMed]

Ioannidis J.P., Gwinn M., Little J., Higgins J.P., Bernstein J.L., Boffetta P., Bondy M., Bray M.S., Brenchley P.E., Buffler P.A., Human Genome Epidemiology Network and the Network of Investigator Networks A road map for efficient and reliable human genome epidemiology. Nat. Genet. 2006;38:3–5. [PubMed]

de Bakker P.I., Ferreira M.A., Jia X., Neale B.M., Raychaudhuri S., Voight B.F. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 2008;17(R2):R122–R128. [PMC free article] [PubMed]

Sagoo G.S., Little J., Higgins J.P., Human Genome Epidemiology Network Systematic reviews of genetic association studies. PLoS Med. 2009;6:e28. [PMC free article] [PubMed]

Zeggini E., Ioannidis J.P. Meta-analysis in genome-wide association studies. Pharmacogenomics. 2009;10:191–201. [PMC free article] [PubMed]

Egger M., Smith G.D., Phillips A.N. Meta-analysis: Principles and procedures. BMJ. 1997;315:1533–1537. [PMC free article] [PubMed]

Ioannidis J.P., Patsopoulos N.A., Evangelou E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE. 2007;2:e841. [PMC free article] [PubMed]

This section covers using imputation in meta-analysis, fixed effects versus random effects meta-analysis, canned software for meta-analysis (such as METAL), Bayesian hierarchical approaches, and references to many applications of meta-analysis in GWAS.

After the meta-analysis section there's a nice section on modeling epistasis, or gene-gene interactions, to prioritize associations with links to other reviews of statistical methods, and brief coverage of data mining procedures like CART, MDR, random forests, conditional entropy methods, neural networks, genetic programming, logic regression, pattern mining, Bayesian partitioning, and penalized regression approaches, again with lots of references. This section also covers parameterization of epistatic models, and covers some of the computation and statistical issues you'll face with the dimensionality problem.

Finally, the review concludes with a section on pathway analysis. As the review admits, pathway analysis in GWAS has no set of strict guidelines or best practices, and new approaches arise every day.

While this review is nearly a year old at this point, I think it's a real gem because of all the references it offers, especially in the meta-analysis and epistasis sections.

AJHG: Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application

Thursday, October 14, 2010

Tutorial on the 1000 Genomes Project Data

There will be a (free) tutorial on the 1000 genomes project at this year's ASHG meeting on Wednesday, November 3, 7:00 – 9:30pm. You can register online at the link below. The tutorial will describe the 1000 genomes data, how to access it, and what to do with it. Specifically, the speakers and topics covered are:

1. Introduction
2. Description of the 1000 Genomes data -- Gabor Marth
3. How to access the data -- Steve Sherry
4. How to use the browser -- Paul Flicek
5. Structural variants -- Jan Korbel
6. How to use the data in disease studies -- Jeff Barrett
7. Q&A

Online registration for 1000 genomes tutorial

Hopefully I'll see some of you there. I'm not sure if imputation is covered in this tutorial. If not, I will cover it here in a future post. I'll soon be using Goncalo Abecasis's 1000 Genomes Imputation Cookbook to impute my own data to the 1kG SNPs, and I'll share any tips I discover along the way.

Wednesday, October 6, 2010

Random forests for high-dimensional genomics data

I know I've been MIA for a while. My defense date is December 3, and I've still got a thesis to write! I'll try to post more soon, but in the meantime follow me on Twitter for things that won't make it into a full blog post.

For those at Vanderbilt and the surrounding environs: I saw this announcement for the next cancer biostatistics workshop that looked interesting.

2010 Cancer Biostatistics Workshop

Friday, october 15, 2010
1:00 to 2:00 PM
898B Preston Research Building

Random forests for high-dimensional genomics data

Xi (Steven) Chen, PhD
Assistant Professor
Department of Biostatistics
Cancer Biostatistics Center, Vanderbilt-Ingram Cancer Center

Wednesday, September 29, 2010

Vanderbilt Genetics Symposium: Beyond Disease Dichotomy - Quantitative Traits and Intermediate Phenotypes

About a year ago I reiterated a point made nicely in a Nature Reviews Genetics article, that there is no such thing as a common disorder - only extremes of quantitative traits. Such is the theme of this year's Annual Vanderbilt Genetics Symposium, "Beyond Disease Dichotomy - Quantitative Traits and Intermediate Phenotypes." This is a day-long event held at the Vanderbilt Student Life Center on Wednesday October 13, 8am-4pm. Registration is free but required to attend. Students in our program will be presenting posters, and students in other programs are welcome to submit an abstract as well.  You can check out the full agenda at the link below. Here is the speaker lineup:

Keynote Speakers

Molly Losh, Ph.D.Jane and Michael Hoffman Assistant Professor of
Communication Sciences & Disorders
Northwestern University

Charles R. Farber, Ph.D.Assistant Professor of Medicine
University of Virginia

Andrew J. Saykin, PsyD, ABCNRaymond C. Beeler Professor of Radiology and Imaging Sciences
Professor of Medical and Molecular Genetics
Director, Center for Neuroimaging
Indiana University School of Medicine
  

Vanderbilt Speakers

Roger Cone, Ph.D.Professor and Chairman, Department of Molecular
Physiology & Biophysics

Dana Crawford, Ph.D.Assistant Professor, Department of Molecular
Physiology & Biophysics
Investigator, Center for Human Genetics Research

Karoly Mirnics, Ph.D.Professor and Vice Chair for Basic Research,
Department of Psychiatry

Vanderbilt Genetics Symposium: Beyond Disease Dichotomy - Quantitative Traits and Intermediate Phenotypes

Monday, September 27, 2010

Towards a More Rigorous Approach to Personalized Medicine

Frank Harrell, chair of our Biostats department, will be giving a seminar entitled "Towards a More Rigorous Approach to Personalized Medicine." As a champion of methods and strategies for reproducible research, Dr. Harrell's lecture on personalized medicine should be interesting.

Frank E Harrell Jr, Professor and Chair, Department of Biostatistics


Wednesday, 29 Sep 10, 1:30-2:30pm, MRBIII Conference Room 1220


Intended Audience: Persons interested in personalized medicine, biomarkers, reproducible research, clinical epidemiology

Description:

There are many ways to personalize the diagnosis and treatment of diseases, pharmacogenomics being one of them. Personalization can be based on routinely collected information, molecular signatures, or on repeated trials on the patient whose treatment plan is being devised. However, current emphases in personalized medicine research often ignore characteristics known to impact treatment benefit, in favor of tests that either generate more revenue or are developed with research that is perhaps easier to fund than "low-tech" research. Failure of the research community to fully utilize rich datasets generated by randomized clinical trials only hightens this concern.

Research supporting personalized medicine can be made more rigorous and relevant. For example in acute diseases, multi-period crossover studies can be used to measure individual response to therapy, and these studies can provide an upper bound on the genome by treatment interaction. When patient by treatment interaction is demonstrated, crossover studies can form an ideal basis for pharmacogenomics. However, even with the best within-patient data, group average treatment effects need to be incorporated in order for predictions for individual patients to have high precision.

There are a few ways to do personalized medicine well but a multitude of ways to do it poorly. Biomarker research in particular has not fulfilled its early promises, a major reason being flawed methodology. The flaws include faulty experimental design, bias, overfitting, weak validation, irreproducible research, data processing and analysis practices, and failure to rigorously show that the new markers add information to readily available clinical data. This will be discussed in terms of Platt's concept of "strong inference", seeking alternative explanations of findings, and sensitivity analysis.

This talk is also a call for the biostatistics and clinical epidemiology communities to be more integrally involved in research related to personalized medicine.

Tuesday, September 21, 2010

Install and load R package "Rcmdr" to quickly install lots of other packages

I recently reformatted my laptop and needed to reinstall R and all the packages that I regularly use. In a previous post I covered R Commander, a nice GUI for R that includes a decent data editor and menus for graphics and basic statistical analysis. Since Rcmdr depends on many other packages, installing and loading Rcmdr like this...

install.packages("Rcmdr", dependencies=TRUE)
library(Rcmdr)

...will also install and load nearly every other package you've ever needed to use (except ggplot2, Hmisc, and rms/design). This saved me a lot of time trying to remember which packages I normally use and installing them one at a time. Specifically, installing and loading Rcmdr will install the following packages from CRAN: fBasics, bitops, ellipse, mix, tweedie, gtools, gdata, caTools, Ecdat, scatterplot3d, ape, flexmix, gee, mclust, rmeta, statmod, cubature, kinship, gam, MCMCpack, tripack, akima, logspline, gplots, maxLik, miscTools, VGAM, sem, mlbench, randomForest, SparseM, kernlab, HSAUR, Formula, ineq, mlogit, np, plm, pscl, quantreg, ROCR, sampleSelection, systemfit, truncreg, urca, oz, fUtilities, fEcofin, RUnit, quadprog, mlmRev, MEMSS, coda, party, ipred, modeltools, e1071, vcd, AER, chron, DAAG, fCalendar, fSeries, fts, its, timeDate, timeSeries, tis, tseries, xts, foreach, DBI, RSQLite, mvtnorm, lme4, robustbase, mboost, coin, xtable, sandwich, zoo, strucchange, dynlm, biglm, rgl, relimp, multcomp, lmtest, leaps, effects, aplpack, abind, RODBC.

Anyone else have a solution for batch-installing packages you use on a new machine or fresh R installation? Leave it in the comments!
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.