Schema for Master DNaseI HS - DNaseI Hypersensitive Site Master List (125 cell types) from ENCODE/Analysis
  Database: hg19    Primary Table: wgEncodeAwgDnaseMasterSites    Row Count: 2,890,742   Data last updated: 2014-06-26
Format description: BED5+ with a float data value field and a list of sources for combined data
On download server: MariaDB table dump directory
fieldexampleSQL type description
bin 585int(10) unsigned Indexing field to speed chromosome range queries.
chrom chr1varchar(255) Reference sequence chromosome or scaffold
chromStart 10120int(10) unsigned Start position in chromosome
chromEnd 10270int(10) unsigned End position in chromosome
name 37varchar(255) Name of item
score 380int(10) unsigned Display score (0-1000)
floatScore 39.23float Data value
sourceCount 37int(10) unsigned Number of sources
sourceIds 1,3,14,15,16,23,24,34,35,38...longblob Source ids

Connected Tables and Joining Fields (via wgEncodeAwgDnaseMasterSites.sourceIds)

Sample Rows

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Master DNaseI HS (wgEncodeAwgDnaseMasterSites) Track Description


DNaseI hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. This track displays an extensive map of human DHSs (~2.9 million) identified through genome-wide profiling in 125 diverse cell and tissue types by the ENCODE Consortium between September 2007 and January 2011, with follow-on analysis and results reported in September 2012.

This master list track represents a summary of the 125 separate cell type DHSs. Each master list element consists of a DHS from at least one of the 125 cell types, and every DHS from a given cell type overlaps at least one master list DHS. For further details see the final paragraph of the Methods section below.

The data underlying this track was produced by two ENCODE production groups (University of Washington and Duke University). Uniform processing of the individual experiments was performed by the ENCODE Analysis Working Group, and is displayed in the ENCODE Uniform DNaseI HS browser track. The DNaseI HS Clusters track provides another view of this data.

Display Conventions and Configuration

The display for this track shows DHS locations and score as grayscale-colored items where higher scores correspond to darker-colored blocks. The label displayed to the left of each item indicates the number of cell types with DnaseI sensitivity detected at the site. Clicking on a displayed block shows a details page that lists the cell types.


DNaseI hypersensitivity mapping was performed using protocols developed by Duke University or University of Washington. Data sets were sequenced on Illumina instruments to an average depth of 30 million uniquely mapping sequence tags (27bp for University of Washington and 20bp for Duke University) per replicate. For uniformity of analysis, some cell-type data sets that exceeded 40M tag depth were randomly subsampled to a depth of 30 million tags. Sequence reads were mapped using the Bowtie aligner, allowing a maximum of two mismatches. Only reads mapping uniquely to the genome were used in the analyses. Mappings were to male or female versions of hg19/GRCh37, depending on cell type, with random regions omitted. Data were analysed jointly using a single algorithm to identify sites.

The hotspot algorithm (John et. al 2011) was applied uniformly to datasets from both protocols. Briefly, hotspot is a scan statistic that uses the binomial distribution to gauge enrichment of tags based on a local background model estimated around every tag. General-sized regions of enrichment are identified as hotspots, and then 150-bp peaks within hotspots are called by looking for local maxima in the tag density profile (sliding window tag count in 150-bp windows, stepping every 20 bp). Further stringencies are applied to the local maxima detection to prevent over calling of spurious peaks. The hotspot program also includes an FDR (false discovery rate) estimation procedure for thresholding hotspots and peaks, based on a simulation approach. Random reads are generated at the same sequencing depth as the target sample, hotspots are called on the simulated data, and the random and observed hotspots are compared via their z-scores (based on the binomial model) to estimate the FDR. Using this procedure, DHSs were identified at an FDR of 1%.

The DHSs called on individual cell-types were consolidated into a master list of 2,890,742 unique, non-overlapping DHS positions by first merging the FDR 1% peaks across all cell-types. Then, for each resulting interval of merged sites, the DHS with the highest z-score was selected for the master list. Any DHSs overlapping the peaks selected for the master list were then discarded. The remaining DHSs were then merged and the process repeated until each original DHS was either in the master list, or discarded. Of these DHSs, 970,100 were specific to a single cell type, 1,920,642 were active in 2 or more cell types, and 3,692 (a small minority) were detected in all cell types. Each master list DHS is annotated with the number of cell-types whose original DHSs overlap the master list DHS.


The master list was generated by the University of Washington ENCODE group on behalf of the ENCODE Analysis Working Group, based on uniformly processed DNaseI peaks (ENCODE Uniform DNaseI HS). Credits for the primary data underlying this track and the uniform peak calls are included in track description pages listed in the Description section of the Uniform DNaseI HS track.

Contact: Robert Thurman (University of Washington)


Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82. PMID: 22955617; PMC: PMC3721348

John S, Sabo PJ, Thurman RE, Sung MH, Biddie SC, Johnson TA, Hager GL, Stamatoyannopoulos JA. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat Genet. 2011 Mar;43(3):264-8. PMID: 21258342

See also the references and credit sections in the related ENCODE Uniform DnaseI HS, ENCODE UW DnaseI HS and ENCODE Duke DnaseI HS tracks.