CpG islands are associated with genes, particularly housekeeping
genes, in vertebrates. CpG islands are typically common near
transcription start sites, and may be associated with promoter
regions. Normally a C (cytosine) base followed immediately by a
G (guanine) base (a CpG) is rare in
vertebrate DNA because the Cs in such an arrangement tend to be
methylated. This methylation helps distinguish the newly synthesized
DNA strand from the parent strand, which aids in the final stages of
DNA proofreading after duplication. However, over evolutionary time
methylated Cs tend to turn into Ts because of spontaneous
deamination. The result is that CpGs are relatively rare unless
there is selective pressure to keep them or a region is not methylated
for some reason, perhaps having to do with the regulation of gene
expression. CpG islands are regions where CpGs are present at
significantly higher levels than is typical for the genome as a whole.
The CpG count is the number of CG dinucleotides in the island.
The Percentage CpG is the ratio of CpG nucleotide bases
(twice the CpG count) to the length.
The genome sequence was masked using the output of RepeatMasker and
the Tandem Repeats Finder (period ≤ 12). A sliding-window search
was performed on the set of CpG locations in the masked genome
sequence to find the longest spans that met the criteria given in
Gardiner-Garden, M. and Frommer, M. (1987) in the References section
The ratio of observed to expect CpGs is calculated as follows:
- length of 200 bp or more
- GC content of 50% or greater
- ratio of observed to expected CpGs of 0.6 or greater
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
This track was generated using a program written by Andy Law (Roslin
Institute) with minor modifications by Angie Hinrichs (UCSC).
Gardiner-Garden M, Frommer M.
CpG islands in vertebrate genomes.
J Mol Biol. 1987 Jul 20;196(2):261-82.