Getting Genetics Done: My thoughts on ICHG 2011

I’m a bit exhausted from a week of excellent science at ICHG. First, let me say that Montreal is a truly remarkable city with fantastic food and a fascinating blend of architectural styles, all making the meeting a fun place to be…. Now on to the genomics – I’ll recap a few of the most exciting sessions I attended. You can find a live-stream of tweets from the meeting by searching the #ICHG2011 and #ICHG hashtags.

On Wednesday, Marylyn Ritchie(@MarylynRitchie) and Nancy Cox organized “Beyond Genome-wide association studies”. Nancy Cox presented some ideas on how to integrate multiple “intermediate” associations for SNPs, such as expression QTLs and newly discovered protein QTLs (More on pQTLs later). This approach which she called a Functional Unit Analysis would group signals together based on the genes they influence. Nicholas Shork presented some nice examples of pros and cons of sequence level annotation algorithms. Trey Idekker gave a very nice talk illustrating some of the properties of epistasis in yeast protein interaction networks. One of the more striking points he made was that epistasis tends to occur between full protein complexes rather than within elements of the complexes themselves. Marylyn Ritchie presented the ideas behind her ATHENA software for machine learning analysis of genetic data, and Manuel Mattesian from Tim Becker’s group presented the methods in their INTERSNP software for doing large-scale interaction analysis. What was most impressive with this session is that there were clear attempts to incorporate underlying biological complexity into data analysis.

On Thursday, I attended the second Statistical Genetics section called “Expanding Genome-wide Association Studies”, organized by Saurabh Ghosh and Daniel Shriner. Having recently attended IGES, I feel pretty “up” on newer analysis techniques, but this session had a few talks that sparked my interest. The first three talks were related to haplotype phasing and the issues surrounding computational accuracy and speed. The basic goal of all these methods is to efficiently estimate genotypes for a common set of loci for all samples of a study using a set of reference haplotypes, usually from the HapMap or 1000 genomes data. Despite these advances, it seems like phasing haplotypes for thousands of samples is still a massive undertaking that requires a high-performance computing cluster. There were several talks about ongoing epidemiological studies, including the Kaiser Permanente UCSF cohort. Neil Risch presented an elegant study design implementing four custom GWAS chips for the four targeted populations. Looks like the data hasn't started to flow from this yet, but when it does we’re sure to learn about lots of interesting ethnic-specific disease effects. My good friend and colleague Dana Crawford presented an in silico GWAS study of hypothyroidism. In her best NPR voice, Dana showed how electronic medical records with GWAS data in the EMERGE network can be re-used to construct entirely new studies nested within the data collected for other specific disease purposes. Her excellent Post-Doc, Logan Dumitrescu presented several gene-environment interactions between Lipid levels and vitamin A and E from Dana’s EAGLE study. Finally Paul O’Reilly presented a cool new way to look at multiple phenotypes by essentially flipping a typical regression equation around, estimating coefficients that relate each phenotype in a study to a single SNP genotype as an outcome. This rather clever approach called MultiPhen is similar to log-linear models I’ve seen used for transmission-based analysis, and allows you to model the “interaction” among phenotypes in much the same way you would look at SNP interactions.

By far the most interesting talks of the meeting (for me) were in the Genomics section on Gene Expression, organized by Tomi Pastinen and Mark Corbett. Chris Mason started the session off with a fantastic demonstration of the power of RNA-seq. Examining transcriptomes of 14 non-human primate species, they validated many of the computational predictions in the AceView gene build, and illustrated that most “exome” sequencing is probably examining less than half of all transcribed sequences. Rupali Patwardhan talked about a system for examining the impact of promoter and enhancer mutations in whole mice, essentially using mutagenesis screens to localize these regions. Ron Hause presented work on the protein QTLs that Nancy Cox alluded to earlier in the conference. Using a high-throughput form of western blots, they systematically examined levels for over 400 proteins in the Yoruba HapMap cell lines. They also illustrate that only about 50% of eQTLs identified in these lines actually alter protein levels. Stephen Montgomery spoke about the impact of rare genetic variants within a transcript on transcript levels. Essentially he showed an epistatic effect on expression, where transcripts with deleterious alleles are less likely to be expressed – an intuitive and fascinating finding, especially for those considering rare-variant analysis. Athma Pai presented a new QTL that influences mRNA decay rates. By measuring multiple time points using RNA-seq, she found individual-level variants that alter decay, which she calls dQTLs. Veronique Adoue looked at cis-eQTLs relative to transcription factor binding sites using ChIP, and Alfonso Buil showed how genetic variants influence gene expression networks (or correlation among gene expression) across tissue types.

I must say despite all the awesome work presented in this session, Michael Snyder stole the show with his talk on the “Snyderome” – his own personal –omics profile collected over 21 months. His whole-genome was sequenced by Complete Genomics, and processed using Rong Chen and Atul Butte’s risk-o-gram to quantify his disease risk. His profile predicted increased risk of T2D, so he began collecting glucose measures and low and behold, he saw a sustained spike in blood glucose levels following a few days following a common cold. His interpretation was that an environmental stress knocked him into a pseudo-diabetic state, and his transcriptome and proteome results corroborated this idea. Granted, this is an N of 1, and there is still lots of work to be done before this type of analysis revolutionizes medicine, but the take home message is salient – multiple -omics are better than one, and everyone’s manifestation of a complex disease is different. This was truly thought-provoking work, and it nicely closed an entire session devoted to understanding the intermediate impact of genetic variants to better understand disease complexity.

This is just my take of a really great meeting -- I'm sure I missed lots of excellent talks. If you saw something good please leave a comment and share!

1 comment:

Niall CardinNovember 2, 2011 at 4:20 PM
Thanks very much for writing this! It's a great read and reminds me of things that I'd forgotten.

I also thought that Mike Stratton's talk on their cancer sequencing was very interesting. As I recall they demonstrated that different mutational mechanisms (or failures to repair) where apparent in different cancers. Also, that multiple metastases at the same location (e.g. liver) probably arose from a single invading cell. The talk itself should be available from the ichg2011 website by mid November.

Note: Only a member of this blog may post a comment.

Tuesday, October 18, 2011

My thoughts on ICHG 2011

1 comment: