Wednesday, March 14, 2012

Video Tip: Convert Gene IDs with Biomart

I get asked frequently how to convert from one gene identifier to another. This can be tricky, especially when relying on gene symbols, as Will pointed out in a previous post a few years ago. There are several tools that can do this, including DAVID and the previously mentioned new Biomart ID Converter, but I still prefer using the Ensembl Biomart for this because of its added flexibility and annotation.

I've started putting together video screencasts for things like this, especially when several of the core's clients ask the same question. In this example, I'll show you how to quickly convert from the Affymetrix Mouse Gene 1.0 ST microarray probeset IDs to an Ensembl gene ID and gene symbol.



You can also do this programmatically in R using the biomaRt package in Bioconductor.

5 comments:

  1. Nice job. I do not know how your clients are experienced with Excel but it might be difficult for them join the two tables. MATCH&INDEX functions (or VLOOKUP/HLOOKUP if you prefer) will do the work.

    Also you probably know about this but there is a nice collection of Ensembl related videos from Giulietta Spudich (EnsemblHelpDesk), covering the Biomart
    http://www.youtube.com/watch?v=DXPaBdPM2vs

    ReplyDelete
  2. This is the way I also convert gene IDs. To join the two tables I use the join function in the plyr library, much easier than moving to excel especially as many of my tables are too large for Excel.

    ReplyDelete
  3. Yes, I would also recommend avoiding Excel at all costs. The results page allows you to export in other formats (tab, csv, etc). These are much more database-INNER JOIN-friendly.

    ReplyDelete
    Replies
    1. This is very much dependent who are you clients/collaborators.

      Personally, I use (Ensembl) MySQL both for converting IDs and joining the tables. But I am the only biostat postdoc at the department. So either I will learn my colleagues how to do the job (with Biomart and Excel) or I have to do it for them every time they need.

      I prefer the first way and I am grateful for every video that explains how to do it.

      Delete
  4. BioDBnet is a good tool to do the ID mapping which I used frequently. http://biodbnet.abcc.ncifcrf.gov/db/db2db.php

    ReplyDelete

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.