Repetitive genomic elements including Transposable Element (TE) families, Satellite, Short Tandem Repeats,
and low complexity DNA as annotated by RepeatMasker. These tracks were constructed with the NCBI BLAST-derived search engine RMBlast and Dfam 3.3
database (plus T2T-CHM13-derived entries submitted to the Dfam 3.6 data
release in April 2022, and HG002 chrY-derived entries not yet submitted).
Individual tracks are identified using the three main components of the analysis, the version of RepeatMasker,
the search engine used, and finally the repeat library version.
RepeatMasker Database: Dfam_3.3 (plus T2T-CHM13-derived entries submitted to the Dfam 3.6 data release in April 2022, and HG002 chrY-derived entries not yet submitted)
Display Conventions and Configuration
Context Sensitive Zooming
This track employs a technique which chooses the appropriate visual representation for the data based on the
zoom scale, and or the number of annotations currently in view. The track will automatically switch from the
most detailed visualization ('Full' mode) to the denser view ('Pack' mode) when the window size is greater
than 45kb of sequence. It will further switch to the even denser single line view ('Dense' mode) if more than
500 annotations are present in the current view.
Dense Mode Visualization
In dense display mode, a single line is displayed denoting the coverage of repeats using a series
of colored boxes. The boxes are colored based on the classification of the repeat (see below for legend).
Pack Mode Visualization
In pack mode, repeats are represented as sets of joined features. These are color coded as above based on the
class of the repeat, and the further details such as orientation (denoted by chevrons) and a family label are provided.
This family label may be optionally turned off in the track configuration.
The pack display mode may also be configured to resemble the original UCSC repeat track. In this visualization
repeat features are grouped by classes (see below), and displayed on seperate track lines. The repeat ranges are
denoted as grayscale boxes, reflecting both the size of the repeat and
the amount of base mismatch, base deletion, and base insertion associated with a repeat element.
The higher the combined number of these, the lighter the shading.
Full Mode Visualization
In the most detailed visualization repeats are displayed as chevron boxes, indicating the size and orientation of
the repeat. The interior grayscale shading represents the divergence of the repeat (see above) while the outline color
represents the class of the repeat. Dotted lines above the repeat and extending left or right
indicate the length of unaligned repeat model sequence and provide context for where a repeat fragment originates in its
consensus or pHMM model. If the length of the unaligned sequence
is large, an iterruption line and bp size is indicated instead of drawing the extension to scale.
For example, the following repeat is a SINE element in the forward orientation with average
divergence. Only the 5' proximal fragment of the consensus sequence is aligned to the genome.
The 3' unaligned length (384bp) is not drawn to scale and is instead displayed using a set of
interruption lines along with the length of the unaligned sequence.
Repeats that have been fragmented by insertions or large internal deletions are now represented
by join lines. In the example below, a LINE element is found as two fragments. The solid
connection lines indicate that there are no unaligned consensus bases between the two fragments.
Also note these fragments form the 3' extremity of the repeat, as there is no unaligned consensus
sequence following the last fragment.
In cases where there is unaligned consensus sequence between the fragments, the repeat will look like
the following. The dotted line indicates the length of the unaligned sequence between the two
fragments. In this case the unaligned consensus is longer than the actual genomic distance between
these two fragments.
If there is consensus overlap between the two fragments, the joining lines will be drawn to indicate
how much of the left fragment is repeated in the right fragment.
The following table lists the repeat class colors:
Color
Repeat Class
SINE - Short Interspersed Nuclear Element
LINE - Long Interspersed Nuclear Element
LTR - Long Terminal Repeat
DNA - DNA Transposon
Simple - Single Nucleotide Stretches and Tandem Repeats
Other - Other Repeats (including class RC - Rolling Circle)
Unknown - Unknown Classification
A "?" at the end of the "Family" or "Class" (for example, DNA?)
signifies that the curator was unsure of the classification. At some point in the future,
either the "?" will be removed or the classification will be changed.
Methods
The RepeatMasker (www.repeatmasker.org) tool was used to generate the datasets found on this track hub.
For the discovery of the additional T2T-CHM13-derived repeats included in this track, as well as the methods (and scripts) for masking the assembly with these T2T-CHM13-derived repeats and previously known repeats: