Thursday, December 16, 2010

Epistasis in New Places

Coming from the lineage of Jason Moore, I am obliged to occasionally remind everyone that biological systems are inherently complex, and to some degree, we should therefore expect statistical models involving those systems to be complex as well.

With the development of GWAS, many approaches to examine epistasis are weighed down by the computational burden of exhaustively conducting billions of statistical tests. With this in mind, several bioinformatics approaches (such as Biofilter and INTERSNP) have focused on looking for gene-gene interactions within biological pathways, ontologies, or protein-protein interaction networks. The assumption underlying these methods is that interactions occur between variants of two different genes – what you could call trans-epistasis.

Considering the epic complexity of the transcriptions process, the genetics of gene expression seems just as likely to harbor epistasis as biological pathways. Following the excellent work of Barbara Stranger, Jonathan Pritchard, and various other luminaries in this area, Stephen Turner and I examined HapMap genotypes and gene expression levels from corresponding cell lines to look for cis-epistasis.

We found 79 genes where SNP pairs in the gene's regulatory region can interact to influence the gene's expression. What is perhaps most interesting is that there are often large distances between the two interacting SNPs (with minimal LD between them), meaning that most haplotype and sliding window approaches would miss these effects. The full text is available online: "Multivariate analysis of regulatory SNPs: empowering personal genomics by considering cis-epistasis and heterogeneity."

Wednesday, December 15, 2010

Which Reference Management Software do you use? (Reader Poll)

When I started grad school I started using Reference Manager (RefMan), similar to EndNote, to manage my references and bibliographies. It's a real pain, and I often feel like I'm powering my computer with the endless pumping and clicking of the mouse that it takes to import a reference into my library.

Recently I've started using Zotero because of how easy it is to import references, store PDFs, and sync between computers. It also integrates with MS Word and allows you to insert citations and format a bibliography using any of EndNote's styles. And it's free.

Before I make the switch and leave RefMan for good, I would love to see what everyone else here uses to manage references. I know many of you use social bookmarking sites like CiteULike,, FriendFeed and others to save and share literature, but I'm really interested to see what software you use while writing to manage references and format bibliographies, and how satisfied you are with what you use.

Thanks for responding! Check back in a few days and I'll summarize what you all said.

Tuesday, December 14, 2010

Sync your Zotero Library with Dropbox using WebDAV

About a year ago I wrote a post about Dropbox - a free, awesome, cross-platform utility that syncs files across multiple computers and securely backs up your files online. Dropbox is indispensable in my own workflow. I store all my R code, perl scripts, and working manuscripts in my Dropbox. You can also share folders on your computer with other Dropbox users, which makes coauthoring a paper and sharing manuscript files a trivial task. If you're not using it yet, start now.

I've also been using Zotero for some time now to manage my references. What's nice about Zotero over RefMan, EndNote and others, is that it runs inside Firefox, and when you're on a Pubmed or Journal website, you can save a reference and the PDF with a single click within your Zotero library. Zotero also interfaces with both MS Word and OO.o, and uses all the standard EndNote styles for formatting bibliographies.

You can also sync your Zotero library, including all your references, snapshots of the HTML version of all your articles, and all the PDFs using the Zotero servers. This syncs your library to every other computer you're using. This is nice when you're away from the office and need to look at a paper, but you're not on your institution's LAN and journal articles are paywalled. The problem with Zotero is a low storage limit - you only get tiny 100MB storage space for free. Have any more papers or references you want to sync and you have to pay for it.

That's if you use Zotero's servers. You can also sync your library using your own WebDAV server. Go into Zotero's preferences and you'll see this under the sync pane.

Here's where Dropbox comes in handy. You get 2GB for free when you sign up for Dropbox, and you can add tons more space by referring others, filling out surveys, viewing the help pages, etc. I've bumped my free account up to 19GB. Dropbox doesn't support WebDAV by itself, but a 3rd party service, DropDAV, allows you to do this. Just give DropDAV your Dropbox credentials, and you now have your own WebDAV server at Now simply point Zotero to sync with your own DropDAV server rather than Zotero's servers, and you can sync gigabytes of references and PDFs using your Dropbox.

Why not simply move the location of your Zotero library to a folder in your dropbox and forget syncing altogether? I did that for a while, but as long as Firefox is open, Zotero holds your library files open, which means they're not syncing properly. If you have instances of Firefox open on more than one machine you're going to run into trouble. Syncing to Dropbox with DropDAV only touches your Dropbox during a Zotero sync operation.

What you'll need:

1. Dropbox. Sign up for a free 2GB Dropbox account. If you use this special referral link, you'll get an extra 250MB for free. Create a folder in your Dropbox called "zotero."

2. DropDAV. Log in here with your Dropbox credentials and you'll have DropDAV up and running.

3. Firefox + Zotero. First, start using Firefox if you haven't already, then install the Zotero extension.

4. Connect Zotero to DropDAV. Go into Zotero's preferences, sync panel. See the screenshot above to set your Zotero library to sync to your Dropbox via WebDAV using DropDAV.

You're done! Now, go out and start saving/syncing gigabytes of papers!

Tuesday, December 7, 2010

Webinar on Revolution R Enterprise

R evangelist David Smith, marketing VP at Revolution R, will be giving a webinar showing off some of the finer features of Revolution R Enterprise - an integrated development environment (IDE) for R that has an enhanced script editor with syntax highlighting, function completion, suntax checking, mouseover help, R code snippets for common tasks, an object browser, a real debugger, and more. Revolution R Enterprise is free for academics. The webinar is tomorrow (Wednesday December 8) at 9am Pacific time (11 CST), and you can register here.

I've been happy using NppToR - a utility that adds syntax highlighting, code folding, and a hotkey to send lines of R code from Notepad++ (hands down the best text editor for Windows) to the R console. You can read more about NppToR on page 62 of the June issue of the R journal. But it looks like the Revolution R Enterprise IDE has much more to offer. Here's an example of the debugger with breakpoints set.

Webinar - Revolution R Enterprise - 100% R and More

Monday, December 6, 2010

Using the "Divide by 4 Rule" to Interpret Logistic Regression Coefficients

I was recently reading a bit about logistic regression in Gelman and Hill's book on hierarchical/multilevel modeling when I first learned about the "divide by 4 rule" for quickly interpreting coefficients in a logistic regression model in terms of the predicted probabilities of the outcome. The idea is pretty simple. The logistic curve (predicted probabilities) is steepest at the center where a+ßx=0, where logit-1(x+ßx)=0.5. See the plot below (or use the R code to plot it yourself).

The slope of this curve (1st derivative of the logistic curve) is maximized at a+ßx=0, where it takes on the value:




So you can take the logistic regression coefficients (not including the intercept) and divide them by 4 to get an upper bound of the predictive difference in probability of the outcome y=1 per unit increase in x. This approximation the best at the midpoint of x where predicted probabilities are close to 0.5, which is where most of the data will lie anyhow.

So if your regression coefficient is 0.8, a rough approximation using the ß/4 rule is that a 1 unit increase in x results in about a 0.8/4=0.2, or 20% increase in the probability of y=1.
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.