Getting Genetics Done: ENCODE

Showing posts with label ENCODE. Show all posts

Wednesday, November 7, 2012

RegulomeDB: Identify DNA Features and Regulatory Elements in Non-Coding Regions

Many papers have noted the challenges associated with assigning function to non-coding genetic variation, and since the majority of GWAS hits for common traits are non-coding, resources for providing some mechanism for these associations are desperately needed.

Boyle and colleagues have constructed a database called RegulomeDB to provide functional assignments to variants using data from manual curation, CHiP-seq data, chromatin state information, eQTLs across multiple cell lines, and some computational predictions generated from DNase footprinting and transcription factor binding motifs.

RegulomeDB implements a tiered category system (1-6) where category 1 has an eQTL association in addition to other ENCODE sources of data, 2 -5 have some ENCODE data only with no eQTL associations, and category 6 has evidence of a binding motif change only. As you might imagine, the annotation density increases as you increase category numbers.

Their simple, but impressive interface will accept RS numbers, or whole BED, GFF, or VCF files for annotation. The resulting output (example above) is downloadable, providing both specifics of the annotation (such as the transcription factor binding to the area) and the functional score for the variant.

http://regulome.stanford.edu/

Wednesday, August 1, 2012

Cscan: Finding Gene Expression Regulators with ENCODE ChIP-Seq Data

Recently published in Nucleic Acids Research:

F. Zambelli, G. M. Prazzoli, G. Pesole, G. Pavesi, Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets., Nucleic acids research 40, W510–5 (2012).

Cscan web interface screenshot

This paper presents a methodology and software implementation that allows users to discover a set of transcription factors or epigenetic modifications that regulate a set of genes of interest. A wealth of data about transcription factor binding exists in the public domain, and this is a good example of a group utilizing those resources to develop tools that are of use to the broader computational biology community.

High-throughput gene expression experiments like microarrays and RNA-seq experiments often result in a list of differentially regulated or co-expressed genes. A common follow-up question asks which transcription factors may regulate those genes of interest. The ENCODE project has completed ChIP-seq experiments for many transcription factors and epigenetic modifications for a number of different cell lines in both human and model organisms. These researchers crossed this publicly available data on enriched regions from ChIP-seq experiments with genomic coordinates of gene annotations to create a table of gene annotations (rows) by ChIP-peak signals, with a presence/absence peak in each cell. Given a set of genes of interest (e.g. differentially regulated genes from an RNA-seq experiment), the method evaluates the over-/under-representation of target sites for the DNA binding protein in each ChIP experiment using a Fisher's exact test. Other methods based on motif-enrichment (using position weight matrices derived from databases like TRANSFAC or JASPAR) would miss DNA-binding factors like the Retinoblastoma protein (RB), which lacks a DNA-binding domain and is recruited to promoters by other transcription factors. In addition to overcoming this limitation, the method presented here also has the advantage of considering tissue-specificity and chromatin accessibility.

The web interface is free and doesn't require registration: http://www.beaconlab.it/cscan

Nucleic Acids Research: Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets

Monday, June 11, 2012

The HaploREG Database for Functional Annotation of SNPs

The ENCODE project continues to generate massive numbers of data points on how genes are regulated. This data will be of incredible use for understanding the role of genetic variation, both for altering low-level cellular phenotypes (like gene expression or splicing), but also for complex disease phenotypes. While it is all deposited into the UCSC browser, ENCODE data is not always the easiest to access or manipulate.

To make epigenomic tracks from the ENCODE project more accessible for interpretation in the context of new or existing GWAS hits, Luke Ward and Manolis Kellis at the BROAD Institute have developed a database called HaploREG. HaploREG uses LD and SNP information from the 1000 Genomes project to map known genetic variants onto ENCODE data, providing a potential mechanism for SNP influence. HaploREG will annotate SNPs with evolutionary constraint measures, predicted chromatin states, and how SNPs alter the Positional Weight Matrices of known transcription factors.

Here's a screenshot from SNP associated with HDL cholesterol levels showing summary information for several SNPs in LD at R2>0.9 in CEU. Clicking each SNP link provides further info.

In addition to providing annotations of user-submitted SNPs, HaploREG also provides cross-references from the NHGRI GWAS Catalog, allowing users to explore the mechanisms behind disease associated SNPs. Check out the site here: http://www.broadinstitute.org/mammals/haploreg/haploreg.php and explore the functionality of any SNPs you might find associated in your work. The more functional information we can include in our manuscripts, the more likely they are to be tested in a model system.

HaploReg: Functional Annotation of SNPs

This blog has moved!