Many papers have noted the challenges associated with assigning function to non-coding genetic variation, and since the majority of GWAS hits for common traits are non-coding, resources for providing some mechanism for these associations are desperately needed.
Boyle and colleagues have constructed a database called RegulomeDB to provide functional assignments to variants using data from manual curation, CHiP-seq data, chromatin state information, eQTLs across multiple cell lines, and some computational predictions generated from DNase footprinting and transcription factor binding motifs.
RegulomeDB implements a tiered category system (1-6) where category 1 has an eQTL association in addition to other ENCODE sources of data, 2 -5 have some ENCODE data only with no eQTL associations, and category 6 has evidence of a binding motif change only. As you might imagine, the annotation density increases as you increase category numbers.
Their simple, but impressive interface will accept RS numbers, or whole BED, GFF, or VCF files for annotation. The resulting output (example above) is downloadable, providing both specifics of the annotation (such as the transcription factor binding to the area) and the functional score for the variant.
http://regulome.stanford.edu/
Showing posts with label ENCODE. Show all posts
Showing posts with label ENCODE. Show all posts
Wednesday, November 7, 2012
Wednesday, August 1, 2012
Cscan: Finding Gene Expression Regulators with ENCODE ChIP-Seq Data
Recently published in Nucleic Acids Research:
F. Zambelli, G. M. Prazzoli, G. Pesole, G. Pavesi, Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets., Nucleic acids research 40, W510–5 (2012).
![]() |
Cscan web interface screenshot |
This paper presents a methodology and software implementation that allows users to discover a set of transcription factors or epigenetic modifications that regulate a set of genes of interest. A wealth of data about transcription factor binding exists in the public domain, and this is a good example of a group utilizing those resources to develop tools that are of use to the broader computational biology community.
High-throughput gene expression experiments like microarrays and RNA-seq experiments often result in a list of differentially regulated or co-expressed genes. A common follow-up question asks which transcription factors may regulate those genes of interest. The ENCODE project has completed ChIP-seq experiments for many transcription factors and epigenetic modifications for a number of different cell lines in both human and model organisms. These researchers crossed this publicly available data on enriched regions from ChIP-seq experiments with genomic coordinates of gene annotations to create a table of gene annotations (rows) by ChIP-peak signals, with a presence/absence peak in each cell. Given a set of genes of interest (e.g. differentially regulated genes from an RNA-seq experiment), the method evaluates the over-/under-representation of target sites for the DNA binding protein in each ChIP experiment using a Fisher's exact test. Other methods based on motif-enrichment (using position weight matrices derived from databases like TRANSFAC or JASPAR) would miss DNA-binding factors like the Retinoblastoma protein (RB), which lacks a DNA-binding domain and is recruited to promoters by other transcription factors. In addition to overcoming this limitation, the method presented here also has the advantage of considering tissue-specificity and chromatin accessibility.
The web interface is free and doesn't require registration: http://www.beaconlab.it/cscan
Monday, June 11, 2012
The HaploREG Database for Functional Annotation of SNPs
The ENCODE project continues to generate massive numbers of
data points on how genes are regulated. This
data will be of incredible use for understanding the role of genetic variation,
both for altering low-level cellular phenotypes (like gene expression or
splicing), but also for complex disease phenotypes. While it is all deposited into the UCSC
browser, ENCODE data is not always the easiest to access or manipulate.
To make epigenomic tracks from the ENCODE project more
accessible for interpretation in the context of new or existing GWAS hits, Luke
Ward and Manolis Kellis at the BROAD Institute have developed a database called
HaploREG. HaploREG uses LD and SNP
information from the 1000 Genomes project to map known genetic variants onto
ENCODE data, providing a potential mechanism for SNP influence. HaploREG will annotate SNPs with evolutionary
constraint measures, predicted chromatin states, and how SNPs alter the
Positional Weight Matrices of known transcription factors.
Here's a screenshot from SNP associated with HDL cholesterol levels showing summary information for several SNPs in LD at R2>0.9 in CEU. Clicking each SNP link provides further info.
Here's a screenshot from SNP associated with HDL cholesterol levels showing summary information for several SNPs in LD at R2>0.9 in CEU. Clicking each SNP link provides further info.
In addition to providing annotations of user-submitted SNPs,
HaploREG also provides cross-references from the NHGRI GWAS Catalog, allowing
users to explore the mechanisms behind disease associated SNPs. Check out the site here: http://www.broadinstitute.org/mammals/haploreg/haploreg.php
and explore the functionality of any SNPs you might find associated in your
work. The more functional information we
can include in our manuscripts, the more likely they are to be tested in a
model system.
HaploReg: Functional Annotation of SNPs
HaploReg: Functional Annotation of SNPs
Tags:
1000 genomes,
Bioinformatics,
Databases,
ENCODE
Subscribe to:
Posts (Atom)