ENC TF Binding UTA TFBS Track Settings
 
Open Chromatin TFBS by ChIP-seq from ENCODE/Open Chrom(UT Austin)

Track collection: ENCODE Transcription Factor Binding

+  Description
+  All tracks in this collection (7)

Maximum display mode:       Reset to defaults   
Select views (Help):
Peaks ▾       Density Signal ▾       Overlap Signal ▾      
Select subtracks by factor and cell line:
 All Factor CTCF  c-Myc  Pol2  Input Control 
Cell Line
GM12878 (Tier 1) 
H1-hESC (Tier 1) 
K562 (Tier 1) 
A549 (Tier 2) 
HeLa-S3 (Tier 2) 
HepG2 (Tier 2) 
HUVEC (Tier 2) 
MCF-7 (Tier 2) 
Monocytes CD14+ RO01746 (Tier 2) 
Colon OC 
Fibrobl 
Gliobla 
GM10248 
GM10266 
GM12891 
GM12892 
GM13976 
GM13977 
GM19238 
GM19239 
GM19240 
GM20000 
Heart OC 
Kidney OC 
LNCaP 
Lung OC 
Medullo 
NHEK 
Pancreas OC 
ProgFib 
Spleen OC 
Cell Line
 All Factor CTCF  c-Myc  Pol2  Input Control 
Select subtracks further by: (select multiple categories and items - help)
Treatment:

List subtracks: only selected/visible    all    ()
  Cell Line↓1 Factor↓2 Treatment↓3 views↓4   Track Name↓5    Restricted Until↓6
 
hide
 Configure
 GM12878  c-Myc      Density Signal  GM12878 cMyc TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2010-06-08 
 
hide
 Configure
 GM12878  c-Myc      Overlap Signal  GM12878 cMyc TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2010-06-08 
 
hide
 Configure
 GM12878  c-Myc      Peaks  GM12878 cMyc TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2010-06-08 
 
hide
 Configure
 GM12878  CTCF      Density Signal  GM12878 CTCF TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2011-07-01 
 
hide
 Configure
 GM12878  CTCF      Overlap Signal  GM12878 CTCF TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2011-07-01 
 
hide
 Configure
 GM12878  CTCF      Peaks  GM12878 CTCF TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2011-07-01 
 
hide
 Configure
 GM12878  Input Control      Density Signal  GM12878 Input TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2009-07-07 
 
hide
 Configure
 GM12878  Pol2      Density Signal  GM12878 Pol2 TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2010-09-22 
 
hide
 Configure
 GM12878  Pol2      Overlap Signal  GM12878 Pol2 TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2010-09-22 
 
hide
 Configure
 GM12878  Pol2      Peaks  GM12878 Pol2 TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2010-09-22 
 
hide
 Configure
 H1-hESC  c-Myc      Density Signal  H1-hESC cMyc TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2010-09-27 
 
hide
 Configure
 H1-hESC  c-Myc      Overlap Signal  H1-hESC cMyc TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2010-09-27 
 
hide
 Configure
 H1-hESC  c-Myc      Peaks  H1-hESC cMyc TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2010-09-27 
 
hide
 Configure
 H1-hESC  CTCF      Density Signal  H1-hESC CTCF TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2010-07-01 
 
hide
 Configure
 H1-hESC  CTCF      Overlap Signal  H1-hESC CTCF TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2010-07-01 
 
hide
 Configure
 H1-hESC  CTCF      Peaks  H1-hESC CTCF TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2010-07-01 
 
hide
 Configure
 H1-hESC  Pol2      Density Signal  H1-hESC Pol2 TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2010-07-02 
 
hide
 Configure
 H1-hESC  Pol2      Overlap Signal  H1-hESC Pol2 TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2010-07-02 
 
hide
 Configure
 H1-hESC  Pol2      Peaks  H1-hESC Pol2 TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2010-07-02 
 
hide
 Configure
 K562  c-Myc      Density Signal  K562 cMyc TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2009-11-27 
 
hide
 Configure
 K562  c-Myc      Overlap Signal  K562 cMyc TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2009-11-27 
 
hide
 Configure
 K562  c-Myc      Peaks  K562 cMyc TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2009-12-20 
 
hide
 Configure
 K562  CTCF      Density Signal  K562 CTCF TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2009-11-27 
 
hide
 Configure
 K562  CTCF      Overlap Signal  K562 CTCF TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2009-11-27 
 
hide
 Configure
 K562  CTCF      Peaks  K562 CTCF TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2009-12-20 
 
hide
 Configure
 K562  Input Control      Density Signal  K562 Input TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2009-08-05 
 
hide
 Configure
 K562  Pol2      Density Signal  K562 Pol2 TFBS ChIP-seq Density Signal ENCODE/OpenChrom-UTA    Data format   2010-06-29 
 
hide
 Configure
 K562  Pol2      Overlap Signal  K562 Pol2 TFBS ChIP-seq Overlap Signal ENCODE/OpenChrom-UTA    Data format   2010-06-29 
 
hide
 Configure
 K562  Pol2      Peaks  K562 Pol2 TFBS ChIP-seq Peaks from ENCODE/OpenChrom-UTA    Data format   2010-06-29 
     Restriction Policy
Source data version: ENCODE July 2011 Freeze
Assembly: Human Feb. 2009 (GRCh37/hg19)

Description

These tracks display chromatin immunoprecipitation (ChIP-seq) evidence as part of the four Open Chromatin track sets (see below). ChIP-seq is a method to identify the specific location of proteins that are directly or indirectly bound to genomic DNA. By identifying the binding location of sequence-specific transcription factors, general transcription machinery components, and chromatin factors, ChIP can help in the functional annotation of the open chromatin regions identified by DNaseI HS mapping and FAIRE.

Together with DNaseI HS and FAIRE experiments, these tracks display the locations of active regulatory elements identified as open chromatin in multiple cell types from the Duke, UNC-Chapel Hill, UT-Austin, and EBI ENCODE group. Within this project, open chromatin was identified using two independent and complementary methods: DNaseI hypersensitivity (HS) and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE), combined with these ChIP-seq assays for select regulatory factors. DNaseI HS and FAIRE provide assay cross-validation with commonly identified regions delineating the highest confidence areas of open chromatin. These ChIP assays provide functional validation and preliminary annotation of a subset of open chromatin sites. Each method employed Illumina (formerly Solexa) sequencing by synthesis as the detection platform. The Tier 1 and Tier 2 cell types were additionally verified by a second platform, high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen.

As a background control experiment, the input genomic DNA sample that was used for ChIP was sequenced. Crosslinked chromatin was sheared and the crosslinks were reversed without carrying out the immunoprecipitation step. This sample was otherwise processed in a manner identical to the ChIP sample as described below. The input track is useful in revealing potential artifacts arising from the sequence alignment process such as copy number differences between the reference genome and the sequenced samples, as well as regions of poor sequence alignability. For cell lines for which there is no input experiment available, the peaks were generated using the control of generic_male or generic_female, as an attempt to create a general background based on input data from several cell types. These files are in "iff" format, which is used when calling peaks with F-seq software, and can be downloaded from the production lab directly from under the section titled "Copy number / karyotype correction."

Other Open Chromatin track sets:

  • Data for the DNaseI HS experiments can be found in Duke DNaseI HS.
  • Data for the FAIRE experiments can be found in UNC FAIRE.
  • A synthesis of all the open chromatin assays for select cell lines can be found in Open Chrom Synth.

Display Conventions and Configuration

This track is a multi-view composite track that contains a single data type with multiple levels of annotation (views). For each view, there are multiple subtracks representing different cell types that display individually on the browser. Instructions for configuring multi-view tracks are here. ChIP data displayed here represents a continuum of signal intensities. The Iyer lab recommends setting the "Data view scaling: auto-scale" option when viewing signal data in full mode to see the full dynamic range of the data. Note that in regions that do not have open chromatin sites, autoscale will rescale the data and inflate the background signal, making the regions appear noisy. Changing back to fixed scale will alleviate this issue. In general, for each experiment in each of the cell types, the UTA TFBS tracks contain the following views:

Peaks
Regions of enriched signal in ChIP experiments. Peaks were called based on signals created using F-Seq, a software program developed at Duke (Boyle et al., 2008b). Significant regions were determined by fitting the data to a gamma distribution to calculate p-values. Contiguous regions where p-values were below a 0.05/0.01 threshold were considered significant. The solid vertical line in the peak represents the point with highest signal.
F-Seq Density Signal
Density graph (wiggle) of signal enrichment calculated using F-Seq for the combined set of sequences from all replicates. F-Seq employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). This method does not look at fixed-length windows but rather weights contributions of nearby sequences in proportion to their distance from that base. It only considers sequences aligned 4 or less times in the genome and uses an alignability background model to try to correct for regions where sequences cannot be aligned. For each cell type, especially important for those with an abnormal karyotype, a model to try to correct for amplifications and deletions that is based on control input data was also used.
Base Overlap Signal
An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair.

Peaks and signals displayed in this track are the results of pooled replicates. The raw sequence and alignment files for each replicate are available for download.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

Methods

Cells were grown according to the approved ENCODE cell culture protocols.

To perform ChIP, proteins were cross-linked to DNA in vivo using 1% formaldehyde solution (Bhinge et al., 2007; ENCODE Project Consortium, 2007). Cross-linked chromatin was sheared by sonication and immunoprecipitated using a specific antibody against the protein of interest. After reversal of the cross-links, the immunoprecipitated DNA was used to identify the genomic location of transcription factor binding. This was accomplished by sequencing of the ends of the immunoprecipitated DNA (ChIP-seq) using the Illumina (Solexa) sequencing system. ChIP data for Tier 1 and Tier 2 cell lines were verified by comparing multiple independent growths (replicates) and determining the reproducibility of the data. For some cell types, additional verification was performed using the same immunoprecipitated DNA by labeling and hybridizing to NimbleGen Human ENCODE tiling arrays (1% of the genome) along with the input DNA as reference (ChIP-chip). A more detailed protocol is available here.

DNA fragments isolated by ChIP are 100-200 bp in length, with the average length being 134 bp. Sequences from each experiment were aligned to the genome using Burrows-Wheeler Aligner (Li et al., 2010) for the GRCh37 (hg19) assembly.

The command used for these alignments was:
> bwa aln -t 8 genome.fa s_1.sequence.txt.bfq > s_1.sequence.txt.sai
Where genome.fa is the whole genome sequence and s_1.sequence.txt.bfq is one lane of sequences converted into the required bfq format.

Sequences from multiple lanes are combined for a single replicate using the bwa samse command, and converted in the sam/bam format using SAMtools.

Only those that aligned to 4 or fewer locations were retained. Other sequences were also filtered based on their alignment to problematic regions (such as satellites and rRNA genes - see supplemental materials). The mappings of these short reads to the genome are available for download.

The resulting digital signal was converted to a continuous wiggle track using F-Seq that employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). Input data has been generated for several cell lines. These are used directly to create a control/background model used for F-Seq when generating signal annotations for these cell lines. These models are meant to correct for sequencing biases, alignment artifacts, and copy number changes in these cell lines. Input data is not being generated directly for other cell lines. Instead, a general background model was derived from the available input data sets. This should provide corrections for sequencing biases and alignment artifacts, but will not correct for cell type specific copy number changes.

The exact command used for this step is:
CTCF:
> fseq -l 300 -v -b <bff files> -p <iff files> aligments.bed
c-Myc:
> fseq -l 600 -v -b <bff files> -p <iff files> aligments.bed
PolII:
> fseq -l 800 -v -b <bff files> -p <iff files> aligments.bed
Where the (bff files) are the background files based on alignability, the (iff files) are the background files based on the input experiments, and alignments.bed are a bed file of filtered sequence alignments.

Discrete ChIP sites (peaks) were identified from ChIP-seq F-seq density signal. Significant regions were determined by fitting the data to a gamma distribution to calculate p-values. Contiguous regions where p-values were below a 0.05/0.01 threshold were considered significant.

Data from the high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen were normalized using the Tukey biweight normalization, and peaks were called using ChIPOTle (Buck, et al., 2005) at multiple levels of significance. Regions matched on size to these peaks that were devoid of any significant signal were also created as a null model. These data were used for additional verification of Tier 1 and Tier 2 cell lines by ROC analysis. Files labeled Validation view containing this data are available for download.

Release Notes

Release 2 (August 2011) of this track adds 34 new experiments including 17 new cell lines.

  • Enhancer and Insulator Functional assays: A subset of DNase and FAIRE regions were cloned into functional tissue culture reporter assays to test for enhancer and insulator activity. Coordinates and results from these experiments can be found here.

    Credits

    These data and annotations were created by a collaboration of multiple institutions (contact: Terry Furey)

    We thank NHGRI for ENCODE funding support.

    References

    Bhinge AA, Kim J, Euskirchen GM, Snyder M, Iyer VR. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res. 2007 Jun;17(6):910-6.

    Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008 Jan 25;132(2):311-22.

    Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8.

    Buck MJ, Nobel AB, Lieb JD. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol. 2005;6(11):R97.

    Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R, Meltzer PS, Wolfsberg TG, Collins FS. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods. 2006 Jul;3(7):503-9.

    Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan;16(1):123-31.

    ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816.

    Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res. 2007 Jun;17(6):877-85.

    Giresi PG, Lieb JD. Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). Methods. 2009 Jul;48(3):233-9.

    Lee BK, Bhinge AA, Battenhouse A, McDaniell RM, Liu Z, Song L, Ni Y, Birney E, Lieb JD, Furey TS et al. Cell-type specific and combinatorial usage of diverse transcription factors revealed by genome-wide binding studies in multiple human cells. Genome Res. 2012 Jan;22(1):9-24.

    Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008 Nov;18(11):1851-8.

    Song L, Crawford GE. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc. 2010 Feb;2010(2):pdb.prot5384.

    Publications

    Lee BK, Bhinge AA, Battenhouse A, McDaniell RM, Liu Z, Song L, Ni Y, Birney E, Lieb JD, Furey TS et al. Cell-type specific and combinatorial usage of diverse transcription factors revealed by genome-wide binding studies in multiple human cells. Genome Res. 2012 Jan;22(1):9-24.

    Data Release Policy

    Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.
  •