Ensembl Regulatory Build with cell type specific activity
Description
This track represents the Ensembl Regulatory Annotation of regional function and activity in each of 17 human cell types.
The Ensembl Regulatory Build provides a genome-wide set of regions that are likely to be involved in gene regulation. These regions are classified into six functional types (see below).
On top of these classifications, in each cell type we add an activity annotation, by comparing the classifications in each region to the cell type specific evidence, i.e. the cell type specific segmentation states and peak calls.
Display Conventions and Configuration
Regions annotated as active are coloured the same as in the multicell Ensembl Regulatory Build. Cell type specific inactive regions are marked in light grey. The colours follow the agreed ENCODE segmentation standard:
- Bright Red - Predicted active promoters
- Light Red - Predicted active promoter flanking regions
- Orange - Predicted active enhancers
- Blue - Active CTCF binding sites
- Gold - Unannotated active transcription factor binding sites
- Yellow - Unannotated active open chromatin regions
- Light Grey - Inactive regions
Methods
Segmentation and annotation of segmentation states
We start by running a segmentation across 17 human cell types (A549, DND-41, GM12878, K562, H1-hESC, HepG2, HeLa-S3, HSSM, HSSMtube, HUVEC, Monocytes-CD14+, NH-A, NHDF-AD, NHEK, NHLF and Osteoblasts). Each segmentation annotates the genomes of its designated cell types with a fixed number of states, which are generally identified with a number.
For each state of each segmentation, we create a summary track which represents the number of cell types that have that state at any given base pair of the genome. The overlaps of this summary function with known features (transcription start sites, exons) and experimental features (CTCF binding sites, known chromatin repression marks) are used to assign a preliminary label to that state. For practical purposes, this annotation is manually curated. The labels used are either one of the above functional labels, or non-functional labels (dead, weak or repressed).
Defining the MultiCell regulatory features
We first determine the a cell type independent functional annotation of the genome, referred to as the MultiCell Regulatory Build. This build defines the function of genomic regions.
To determine whether a state is useful in practice, it is compared to the overall density of transcription factor binding (as measured with ChIP-seq). Applying increasing integer cutoffs to this signal, we define progressively smaller regions. If these regions reach a 2 fold enrichment in transcription factor binding signal, then the state is retained for the build. This means that although all states are annotated, not all are used to build the Regulatory Build.
For any given segmentation, we define initial regions. For every functional label, all the state summaries that were assigned that labelled and judged informative are summed into a single function. Using the overall TF binding signal as true signal, we select the threshold which produces the highest F-score.
We then merge the regulatory features across segmentations by annotation.
Some simplifications are applied a posteriori:
- Distal enhancers which overlap promoter flanking regions are merged into the latter.
- Promoter flanking regions which overlap transcription start sites are incorporated into the flanking regions of the latter features.
Extra features
In addition to the segmentation states, which are essentially derived from histone marks, we integrate independent experimental evidence:
- Transcription factor binding sites which were observed through ChIP-seq but are covered by none of the newly defined features are added to the Build.
- Open chromatin regions which were experimentally observed but covered by none of the above annotations, are also added to the Build.
Cell type specific annotations
The cell type specific regulatory features are identical to the MultiCell ones in position and classification but have an added activity annotation. Currently this activity state is purely binary (on/off) although this could be extended to finer annotations (poised, repressed, information not available, ...).
For each cell type and each functional annotation, we check whether there is segmentation state or experimental evidence which could be used to test the activity of this annotation. If this evidence exists, then all MultiCell features with that label are annotated as on or off by simple overlap analysis. If this evidence is not available, these features are not represented in the cell type specific annotation (Note: this could change in the near future).
References
Zerbino DR, Johnson N, Wilder SP, Juettemann T, et al.
Ensembl Regulation Resources. (in preparation).
Flicek P, et al.
Ensembl 2014.
Nucleic Acids Research 2014 Jan;42(Database issue):D749-55.
ENCODE Project Consortium.
An integrated encyclopedia of DNA elements in the human genome.
Nature 2012 Sep 6;489(7414):57-74.
Contact
Ensembl Helpdesk
|
|