DNaseI hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the
discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators,
silencers and locus control regions.
This track displays an extensive map of human DHSs (~2.9 million) identified through
genome-wide profiling in
125 diverse cell and tissue types
by the ENCODE Consortium between September 2007 and January 2011, with follow-on analysis
and results reported in September 2012.
This master list track represents a summary of the 125 separate cell type DHSs.
Each master list element consists of a DHS from at least one of the 125 cell types,
and every DHS from a given cell type overlaps at least one master list DHS.
For further details see the final paragraph of the Methods section below.
The data underlying this track was produced by two ENCODE production groups
(University of Washington and Duke University).
Uniform processing of the individual experiments was performed by the ENCODE Analysis Working
Group, and is displayed in the
ENCODE Uniform DNaseI HS
The DNaseI HS Clusters
track provides another view of this data.
Display Conventions and Configuration
The display for this track shows DHS locations and score as grayscale-colored items where
higher scores correspond to darker-colored blocks. The label displayed to the left of
each item indicates the number of cell types with DnaseI sensitivity detected at the site.
Clicking on a displayed block shows a details page that lists the cell types.
DNaseI hypersensitivity mapping was performed using protocols developed by Duke University
or University of Washington.
Data sets were sequenced on Illumina instruments to an average depth of 30 million
uniquely mapping sequence tags (27bp for University of Washington and 20bp for Duke University)
For uniformity of analysis, some cell-type data sets that exceeded 40M tag depth were
randomly subsampled to a depth of 30 million tags.
Sequence reads were mapped using the Bowtie aligner, allowing a maximum of two mismatches.
Only reads mapping uniquely to the genome were used in the analyses.
Mappings were to male or female versions of hg19/GRCh37, depending on cell type,
with random regions omitted.
Data were analysed jointly using a single algorithm to identify sites.
The hotspot algorithm (John et. al 2011) was applied uniformly to datasets from
Briefly, hotspot is a scan statistic that uses the binomial distribution to
gauge enrichment of tags based on a local background model estimated around every tag.
General-sized regions of enrichment are identified as hotspots, and then 150-bp peaks
within hotspots are called by looking for local maxima in the tag density
profile (sliding window tag count in 150-bp windows, stepping every 20 bp).
Further stringencies are applied to the local maxima detection to prevent over calling
of spurious peaks.
The hotspot program also includes an FDR (false discovery rate) estimation procedure for thresholding
hotspots and peaks, based on a simulation approach.
Random reads are generated at the same sequencing depth as the target sample, hotspots
are called on the simulated data, and the random and observed hotspots are compared
via their z-scores (based on the binomial model) to estimate the FDR.
Using this procedure, DHSs were identified at an FDR of 1%.
The DHSs called on individual cell-types were consolidated into a master list of 2,890,742 unique,
non-overlapping DHS positions by first merging the FDR 1% peaks across all cell-types.
Then, for each resulting interval of merged sites, the DHS with the highest z-score was
selected for the master list.
Any DHSs overlapping the peaks selected for the master list were then discarded.
The remaining DHSs were then merged and the process repeated until each original DHS
was either in the master list, or discarded.
Of these DHSs, 970,100 were specific to a single cell type, 1,920,642 were active in 2
or more cell types, and 3,692 (a small minority) were detected in all cell types.
Each master list DHS is annotated with the number of cell-types whose original DHSs
overlap the master list DHS.
The master list was generated by the University of Washington ENCODE group on behalf of the ENCODE Analysis Working Group, based on uniformly processed DNaseI peaks (ENCODE Uniform DNaseI HS). Credits for the primary data underlying this track and the uniform peak calls are included in track description pages listed in the Description section of the Uniform DNaseI HS track.
Contact: Robert Thurman (University of Washington)
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang
H, Vernot B et al.
The accessible chromatin landscape of the human genome.
Nature. 2012 Sep 6;489(7414):75-82.
PMID: 22955617; PMC: PMC3721348
John S, Sabo PJ, Thurman RE, Sung MH, Biddie SC, Johnson TA, Hager GL, Stamatoyannopoulos JA.
Chromatin accessibility pre-determines glucocorticoid receptor binding patterns.
Nat Genet. 2011 Mar;43(3):264-8.
See also the references and credit sections in the related
ENCODE Uniform DnaseI HS,
ENCODE UW DnaseI HS
ENCODE Duke DnaseI HS