Cell Type Activity Track Settings
 
Ensembl Regulatory Build with cell type specific activity

Display mode:       Reset to defaults
All subtracks:
List subtracks: only selected/visible    all    ()  
 
dense
 A549  Projection of the Ensembl Regulatory build onto A549   Data format 
 
dense
 DND41  Projection of the Ensembl Regulatory build onto DND41   Data format 
 
dense
 GM12878  Projection of the Ensembl Regulatory build onto GM12878   Data format 
 
dense
 H1HESC  Projection of the Ensembl Regulatory build onto H1HESC   Data format 
 
dense
 HELAS3  Projection of the Ensembl Regulatory build onto HELAS3   Data format 
 
dense
 HEPG2  Projection of the Ensembl Regulatory build onto HEPG2   Data format 
 
dense
 HMEC  Projection of the Ensembl Regulatory build onto HMEC   Data format 
 
dense
 HSMM  Projection of the Ensembl Regulatory build onto HSMM   Data format 
 
dense
 HSMMT  Projection of the Ensembl Regulatory build onto HSMMT   Data format 
 
dense
 HUVEC  Projection of the Ensembl Regulatory build onto HUVEC   Data format 
 
dense
 K562  Projection of the Ensembl Regulatory build onto K562   Data format 
 
dense
 MONO  Projection of the Ensembl Regulatory build onto MONO   Data format 
 
dense
 NHA  Projection of the Ensembl Regulatory build onto NHA   Data format 
 
dense
 NHDFAD  Projection of the Ensembl Regulatory build onto NHDFAD   Data format 
 
dense
 NHEK  Projection of the Ensembl Regulatory build onto NHEK   Data format 
 
dense
 NHLF  Projection of the Ensembl Regulatory build onto NHLF   Data format 
 
dense
 OSTEO  Projection of the Ensembl Regulatory build onto OSTEO   Data format 
    
Assembly: Human Feb. 2009 (GRCh37/hg19)

Ensembl Regulatory Build with cell type specific activity

Description

This track represents the Ensembl Regulatory Annotation of regional function and activity in each of 17 human cell types.

The Ensembl Regulatory Build provides a genome-wide set of regions that are likely to be involved in gene regulation. These regions are classified into six functional types (see below). On top of these classifications, in each cell type we add an activity annotation, by comparing the classifications in each region to the cell type specific evidence, i.e. the cell type specific segmentation states and peak calls.

Display Conventions and Configuration

Regions annotated as active are coloured the same as in the multicell Ensembl Regulatory Build. Cell type specific inactive regions are marked in light grey. The colours follow the agreed ENCODE segmentation standard:

  •  Bright Red  - Predicted active promoters
  •  Light Red  - Predicted active promoter flanking regions
  •  Orange  - Predicted active enhancers
  •  Blue  - Active CTCF binding sites
  •  Gold  - Unannotated active transcription factor binding sites
  •  Yellow  - Unannotated active open chromatin regions
  •  Light Grey  - Inactive regions

Methods

Segmentation and annotation of segmentation states

We start by running a segmentation across 17 human cell types (A549, DND-41, GM12878, K562, H1-hESC, HepG2, HeLa-S3, HSSM, HSSMtube, HUVEC, Monocytes-CD14+, NH-A, NHDF-AD, NHEK, NHLF and Osteoblasts). Each segmentation annotates the genomes of its designated cell types with a fixed number of states, which are generally identified with a number.

For each state of each segmentation, we create a summary track which represents the number of cell types that have that state at any given base pair of the genome. The overlaps of this summary function with known features (transcription start sites, exons) and experimental features (CTCF binding sites, known chromatin repression marks) are used to assign a preliminary label to that state. For practical purposes, this annotation is manually curated. The labels used are either one of the above functional labels, or non-functional labels (dead, weak or repressed).

Defining the MultiCell regulatory features

We first determine the a cell type independent functional annotation of the genome, referred to as the MultiCell Regulatory Build. This build defines the function of genomic regions.

To determine whether a state is useful in practice, it is compared to the overall density of transcription factor binding (as measured with ChIP-seq). Applying increasing integer cutoffs to this signal, we define progressively smaller regions. If these regions reach a 2 fold enrichment in transcription factor binding signal, then the state is retained for the build. This means that although all states are annotated, not all are used to build the Regulatory Build.

For any given segmentation, we define initial regions. For every functional label, all the state summaries that were assigned that labelled and judged informative are summed into a single function. Using the overall TF binding signal as true signal, we select the threshold which produces the highest F-score.

We then merge the regulatory features across segmentations by annotation.

Some simplifications are applied a posteriori:

  • Distal enhancers which overlap promoter flanking regions are merged into the latter.
  • Promoter flanking regions which overlap transcription start sites are incorporated into the flanking regions of the latter features.

Extra features

In addition to the segmentation states, which are essentially derived from histone marks, we integrate independent experimental evidence:

  • Transcription factor binding sites which were observed through ChIP-seq but are covered by none of the newly defined features are added to the Build.
  • Open chromatin regions which were experimentally observed but covered by none of the above annotations, are also added to the Build.

Cell type specific annotations

The cell type specific regulatory features are identical to the MultiCell ones in position and classification but have an added activity annotation. Currently this activity state is purely binary (on/off) although this could be extended to finer annotations (poised, repressed, information not available, ...).

For each cell type and each functional annotation, we check whether there is segmentation state or experimental evidence which could be used to test the activity of this annotation. If this evidence exists, then all MultiCell features with that label are annotated as on or off by simple overlap analysis. If this evidence is not available, these features are not represented in the cell type specific annotation (Note: this could change in the near future).

References

Zerbino DR, Johnson N, Wilder SP, Juettemann T, et al. Ensembl Regulation Resources. (in preparation).

Flicek P, et al. Ensembl 2014. Nucleic Acids Research 2014 Jan;42(Database issue):D749-55.

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 Sep 6;489(7414):57-74.

Contact

Ensembl Helpdesk