Thursday, November 17, 2011

GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data directly into R using the GEOquery bioconductor package written (and well documented) by Sean Davis, and analyze the data using the limma package.

GEO2R is a very nice web-based tool to do this graphically and automatically. Enter the GEO series number in the search box (or use this one for an example). Start by creating groups (e.g. control vs treatment, early vs late time points in a time course, etc), then select samples to add to that group.



Scroll down to the bottom and click Top 250 to run an analysis in limma (the users guide documents this well). GEO2R will automatically fetch the data, group your samples, create your design matrix for your differential expression analysis, run the analysis, and annotate the results. A big complaint with point-and-click GUI and web based applications is the lack of reproducibility. GEO2R obviates this problem by giving you all the R code it generated to run the analysis. Click the R script tab to see the R code it generated, and save it for later.



The options tab allows you to adjust the multiple testing correction method, and the value distribution tab lets you take a look at the distribution gene expression values among the samples that you assigned to your groups.



There's no built-in quality assessment tools in GEO2R, but you can always take the R code it generated and do your own QA/QC. It's also important to verify what values it's pulling from each array into the data matrix. In this example, epithelial cells at various time points were compared to a reference cell line, and the log base 2 fold change was calculated. This was used in the data matrix rather than the actual expression values.

GEO2R is a very nice tool to quickly run an analysis on data in GEO. Now, if we could only see something similar for the European repository, ArrayExpress.

GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

6 comments:

  1. Awesome blog man, the things you have mentioned above are really informative and are examples of your awesome writing skills and a good blog.
    .net Obfuscator

    ReplyDelete
  2. Hats off ..to GEO2R groups and stephens for making us aware of it..
    cheers
    Ajay

    ReplyDelete
  3. the problem is....what pvalue Can I used?

    ReplyDelete
  4. How can one analyze data from Agilent two color dye array in GEO2R? The program is asking for two groups to start with.
    Thanks
    babru samal

    ReplyDelete

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.