Schema for Uniform DNaseI HS - DNaseI Hypersensitivity Uniform Peaks from ENCODE/Analysis

Home
Genomes
Genome Browser
Tools
Mirrors
- Euro/Asia Mirrors
- Mirroring Instructions
- US Server
- European Server
- Asian Server
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Conditions of Use
- Our History
- Jobs
- Licenses
- Contact Us

field

example

SQL type

info

description

bin

585

smallint(5) unsigned

range

Indexing field to speed chromosome range queries.

chrom

chr1

varchar(255)

values

Reference sequence chromosome or scaffold

chromStart

10180

int(10) unsigned

range

Start position in chromosome

chromEnd

10330

int(10) unsigned

range

End position in chromosome

name

varchar(255)

values

Name given to a region (preferably unique). Use . if no name is assigned

score

101

int(10) unsigned

range

Indicates how dark the peak will be displayed in the browser (0-1000)

strand

char(1)

values

+ or - or . for unknown

signalValue

float

range

Measurement of average enrichment for the region

pValue

-1

float

range

Statistical significance of signal value (-log10). Set to -1 if not used.

qValue

-1

float

range

Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if not used.

peak

-1

int(11)

range

Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-source called.

bin

chrom

chromStart

chromEnd

name

score

strand

signalValue

pValue

qValue

peak

585

chr1

10180

10330

101

-1

586

chr1

237720

237870

103

-1

588

chr1

521480

521630

100

-1

589

chr1

565560

565710

101

-1

589

chr1

565860

566010

101

-1

589

chr1

566440

566590

101

-1

589

chr1

566760

566910

101

-1

589

chr1

566980

567130

101

-1

589

chr1

567440

567590

720

15355

-1

589

chr1

567740

567890

102

-1

Description

The ENCODE Analysis Working Group (AWG) has performed uniform processing on datasets produced by multiple data production groups in the ENCODE Consortium. This track represents a uniform set of open chromatin elements (DNaseI hypersensitive sites) in 125 ENCODE cell types, based on DNase-seq data produced by the "Open Chromatin" (Duke/UNC/UT-A) and University of Washington (UW) ENCODE groups from the project inception in 2007 through the ENCODE January 2011 data freeze. The AWG uniform datasets are used in downstream analysis pipelines by members of the ENCODE Consortium and are one of the primary sources of data referenced in the 2012 ENCODE integrative analysis paper (ENCODE Project Consortium 2012). More information about the ENCODE integrative analysis is here.

The primary and lab-processed data (along with methods descriptions, credits and references) on which this track is based are available in the following ENCODE tracks:

Display Conventions and Configuration

The display for this track shows site location and signal value as grayscale-colored items where higher signal values correspond to darker-colored blocks. The display can be filtered to higher valued items, using the 'Minimum signal' configuration item.

This track is a composite annotation track containing multiple subtracks, one for each cell type. The display mode and filtering of each subtrack can be individually controlled. For more information about track configuration, see Configuring Multi-View Tracks.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks. The UCSC Accession listed in the metadata can be used with the File Search tool to retrieve primary data files underlying datasets of interest.

In the subtrack selection list, the ENCODE tier (priority) is listed for each cell type. Tier 1 and Tier 2 represent categories with cell types designated for intensive study by the ENCODE investigators. After the January 2011 data freeze, an additional set of cell types were promoted from Tier 3 to Tier 2 to broaden the list of intensively studied cell types. These cell types are listed as Tier 2* in the subtrack list here (and are described as 'newly promoted to tier 2: not in 2011 analysis' on the ENCODE Common Cell Types page).

Methods

The DNase-seq aligned sequence reads (BAM files) from the primary data tracks listed above were processed using the UW HotSpot pipeline (as described in the UW DnaseI HS track description above). First, "hotspots" (i.e. broad, variable-sized regions of generalized chromatin accessibility) were identified using a relaxed threshold. Then more stringent "narrowPeaks" (False Discovery Rate 1% peaks) were generated by first thresholding hotspots (using random simulation) at FDR 1%, and then (essentially) locating local maxima of the tag density (150 bp window, sliding every 20 bp) within the hotspots. FDR 1% peaks were set to a fixed width of 150 bp.

The Duke DNase primary data were pre-processed to reduce variability by combining all replicates for a given cell-type and subsampling at a level of 30 million tags. For the UW data, the replicate 1 calls from the primary UW DNaseI HS data track were used. For the 14 cell types where both groups have data, a collapsed set of FDR 1% peaks were generated by taking a non-overlapping selection of the calls from both centers and giving preference to the peak with the higher z-score when calls overlapped. A collapsed set of hotspots on these cell types was generated by merging the calls from both centers (taking the union interval of overlapping intervals).

Credits

The processed data for this track were generated by the University of Washington ENCODE group on behalf of the ENCODE Analysis Working Group. Credits for the primary data underlying this track are included in track description pages listed in the Description section above.

Contact: Robert Thurman (University of Washington)

References

ENCODE Project Consortium, Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.

Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82.

See also the references in the related ENCODE Duke DNaseI HS and ENCODE UW DnaseI HS tracks.

Data Release Policy

While primary ENCODE data is subject to a restriction period as described in the ENCODE data release policy, this restriction does not apply to the integrative analysis results. The data in this track are freely available.