Tuesday, November 1, 2011

Guide to RNA-seq Analysis in Galaxy

James Taylor came to UVA last week and gave an excellent talk on how Galaxy enables transparent and reproducible research in genomics. I'm gearing up to take on several projects that involve next-generation sequencing, and I'm considering installing my own Galaxy framework on a local cluster or on the cloud.

If you've used Galaxy in the past you're probably aware that it allows you to share data, workflows, and histories with other users. New to me was the pages section, where an entire analysis is packaged on a single pages, and vetting is crowdsourced to other Galaxy users in the form of comments and voting.

I recently found a page published by Galaxy user Jeremy that serves as a guide to RNA-seq analysis using Galaxy. If you've never done RNA-seq before it's a great place to start. The guide has all the data you need to get started on an experiment where you'll use TopHat/Bowtie to align reads to a reference genome, and Cufflinks to assemble transcripts and quantify differential gene expression, alternative splicing, etc. The dataset is small, so all the analyses start and finish quickly, allowing you to finish the tutorial in just a few hours. The author was kind enough to include links to relevant sections of the TopHat and Cufflinks documentation where it's needed in the tutorial. Hit the link below to get started.

Galaxy Pages: RNA-seq Analysis Exercise

3 comments:

  1. Check out another workflow on RNA-seq here
    http://kevin-gattaca.blogspot.com/2011/02/rna-seq-analysis-workflow-on-galaxy.html

    and on a side note
    Currently, RNA-seq analysis for SOLiD data is available only on Galaxy test server:
    http://test.g2.bx.psu.edu/

    ReplyDelete
  2. Hello Stephen. I'm helping a student with a human bioinformatics project. We're interested in generating a single relative gene expression value for each gene in the human genome (for different cell lines of interest) using the RNA-seq data from the UCSC archive. We can see raw signal values by displaying the following file as a track in the Genome Browser:

    Track name: BJ cell pA+ + 1
    Table name: wgEncodeCshlLongRnaSeqBjCellPapMinusRawSigRep1
    File name: /gbdb/hg19/bbi/wgEncodeCshlLongRnaSeqBjCellPapPlusRawSigRep1.bigWig

    But where we get stuck is in trying to go from here is using this in Galaxy to generate some expression value for each gene. Cufflinks seems like the right tool for the output we want, but we're not sure what the inputs are supposed to be. We'd appreciate any help you can provide.

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.