Schema for Uniform DNaseI HS - DNaseI Hypersensitivity Uniform Peaks from ENCODE/Analysis
  Database: hg19    Primary Table: wgEncodeAwgDnaseUwGm12865UniPk    Row Count: 143,882   Data last updated: 2012-12-08
Format description: BED6+4 Peaks of signal enrichment based on pooled, normalized (interpreted) data.
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 585smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 10180int(10) unsigned range Start position in chromosome
chromEnd 10330int(10) unsigned range End position in chromosome
name .varchar(255) values Name given to a region (preferably unique). Use . if no name is assigned
score 101int(10) unsigned range Indicates how dark the peak will be displayed in the browser (0-1000)
strand .char(1) values + or - or . for unknown
signalValue 25float range Measurement of average enrichment for the region
pValue -1float range Statistical significance of signal value (-log10). Set to -1 if not used.
qValue -1float range Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if not used.
peak -1int(11) range Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-source called.

Sample Rows
 
binchromchromStartchromEndnamescorestrandsignalValuepValueqValuepeak
585chr11018010330.101.25-1-1-1
586chr1237720237870.103.80-1-1-1
588chr1521480521630.100.16-1-1-1
589chr1565560565710.101.36-1-1-1
589chr1565860566010.101.39-1-1-1
589chr1566440566590.101.31-1-1-1
589chr1566760566910.101.35-1-1-1
589chr1566980567130.101.30-1-1-1
589chr1567440567590.720.15355-1-1-1
589chr1567740567890.102.60-1-1-1

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Uniform DNaseI HS (wgEncodeAwgDnaseUniform) Track Description
 

Description

The ENCODE Analysis Working Group (AWG) has performed uniform processing on datasets produced by multiple data production groups in the ENCODE Consortium. This track represents a uniform set of open chromatin elements (DNaseI hypersensitive sites) in 125 ENCODE cell types, based on DNase-seq data produced by the "Open Chromatin" (Duke/UNC/UT-A) and University of Washington (UW) ENCODE groups from the project inception in 2007 through the ENCODE January 2011 data freeze. The AWG uniform datasets are used in downstream analysis pipelines by members of the ENCODE Consortium and are one of the primary sources of data referenced in the 2012 ENCODE integrative analysis paper (ENCODE Project Consortium 2012). More information about the ENCODE integrative analysis is here.

The primary and lab-processed data (along with methods descriptions, credits and references) on which this track is based are available in the following ENCODE tracks:

Display Conventions and Configuration

The display for this track shows site location and signal value as grayscale-colored items where higher signal values correspond to darker-colored blocks. The display can be filtered to higher valued items, using the 'Minimum signal' configuration item.

This track is a composite annotation track containing multiple subtracks, one for each cell type. The display mode and filtering of each subtrack can be individually controlled. For more information about track configuration, see Configuring Multi-View Tracks.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks. The UCSC Accession listed in the metadata can be used with the File Search tool to retrieve primary data files underlying datasets of interest.

In the subtrack selection list, the ENCODE tier (priority) is listed for each cell type. Tier 1 and Tier 2 represent categories with cell types designated for intensive study by the ENCODE investigators. After the January 2011 data freeze, an additional set of cell types were promoted from Tier 3 to Tier 2 to broaden the list of intensively studied cell types. These cell types are listed as Tier 2* in the subtrack list here (and are described as 'newly promoted to tier 2: not in 2011 analysis' on the ENCODE Common Cell Types page).

Methods

The DNase-seq aligned sequence reads (BAM files) from the primary data tracks listed above were processed using the UW HotSpot pipeline (as described in the UW DnaseI HS track description above). First, "hotspots" (i.e. broad, variable-sized regions of generalized chromatin accessibility) were identified using a relaxed threshold. Then more stringent "narrowPeaks" (False Discovery Rate 1% peaks) were generated by first thresholding hotspots (using random simulation) at FDR 1%, and then (essentially) locating local maxima of the tag density (150 bp window, sliding every 20 bp) within the hotspots. FDR 1% peaks were set to a fixed width of 150 bp.

The Duke DNase primary data were pre-processed to reduce variability by combining all replicates for a given cell-type and subsampling at a level of 30 million tags. For the UW data, the replicate 1 calls from the primary UW DNaseI HS data track were used. For the 14 cell types where both groups have data, a collapsed set of FDR 1% peaks were generated by taking a non-overlapping selection of the calls from both centers and giving preference to the peak with the higher z-score when calls overlapped. A collapsed set of hotspots on these cell types was generated by merging the calls from both centers (taking the union interval of overlapping intervals).

Credits

The processed data for this track were generated by the University of Washington ENCODE group on behalf of the ENCODE Analysis Working Group. Credits for the primary data underlying this track are included in track description pages listed in the Description section above.

Contact: Robert Thurman (University of Washington)

References

ENCODE Project Consortium, Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.

Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82.

See also the references in the related ENCODE Duke DNaseI HS and ENCODE UW DnaseI HS tracks.

Data Release Policy

While primary ENCODE data is subject to a restriction period as described in the ENCODE data release policy, this restriction does not apply to the integrative analysis results. The data in this track are freely available.