Schema for RepeatMasker - RepeatMasker Repetitive Elements
  Database: hub_567047_hs1    Primary Table: hub_567047_t2tRepeatMasker Data last updated: 2022-04-27
Big Bed File Download: /gbdb/hs1/t2tRepeatMasker/chm13v2.0_rmsk.bb
Item Count: 4,636,653
The data is stored in the binary BigBed format.

Format description: Repetitive Element Annotation
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart165585615Start position of visualization on chromosome
chromEnd165591770End position of visualation on chromosome
nameL1PA4#LINE/L1Name repeat, including the type/subtype suffix
score26Divergence score
strand-+ or - for strand
thickStart165585618Start position of aligned sequence on chromosome
thickEnd165586001End position of aligned sequence on chromosome
reserved0Reserved
blockCount3Count of sequence blocks
blockSizes3,383,5769A comma-separated list of the block sizes(+/-)
blockStarts-1,3,-1A comma-separated list of the block starts(+/-)
id276864A unique identifier for the joined annotations in this record
description3265 2.6 0.0 0.0 chr1 165585619 165586001 (82801327) C L1PA4 LINE/L1 (3) 6152 5770 276864 A comma separated list of technical annotation descriptions

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizesblockStartsiddescription
chr1165585615165591770L1PA4#LINE/L126-165585618165586001033,383,5769-1,3,-12768643265 2.6 0.0 0.0 chr1 165585619 165586001 (82801327) C L1PA4 LINE/L1 (3) 6152 5770 276864
chr1165586014165592164L1PA3#LINE/L136-165586016165588966032,2950,3198-1,2,-127686617182 3.6 0.2 0.0 chr1 165586017 165588966 (82798362) C L1PA3 LINE/L1 (2) 6153 3199 276866
chr1165586064165598270L1P1#LINE/L158-165588961165598270052897,146,-40,3150,0-1,2897,-1,9056,-1276867882 6.8 0.0 1.6 chr1 165588962 165589107 (82798221) C L1P1 LINE/L1 (2897) 3249 3106 276867 ,17309 4.8 0.0 0.2 chr1 165595121 165 ...
chr1165589095165595250L1HS#LINE/L117-165589095165595126030,6031,124-1,0,-127686826615 1.7 0.0 0.0 chr1 165589096 165595126 (82792202) C L1HS LINE/L1 (0) 6155 125 276868
chr1165593331165600629L1M4b#LINE/L1221-165598572165600464095241,93,172,969,5,94,6,102,165-1,5241,-1,5840,-1,6870,-1,7031,-1276870262 16.8 6.5 4.2 chr1 165598573 165598665 (82788663) C L1M4b LINE/L1 (5241) 1626 1532 276870 ,3928 23.4 3.6 1.6 chr1 165599172 1 ...
chr1165597031165605151L1M2a#LINE/L1245-165601234165604747074203,209,-280,2387,-38,731,404-1,4203,-1,4468,-1,6985,-12768782065 25.3 5.0 0.9 chr1 165601235 165601443 (82785885) C L1M2a LINE/L1 (4203) 3364 3126 276878 ,5662 23.7 1.9 3.1 chr1 165601500 ...
chr1165597779165601203L2a#LINE/L2321+165600795165600942033015,147,261-1,3016,-1276876404 32.1 4.8 2.7 chr1 165600796 165600942 (82786386) + L2a LINE/L2 3016 3165 (261) 276876
chr1165598263165598571AluSx#SINE/Alu128-165598272165598570039,298,1-1,9,-12768692074 12.8 1.3 0.0 chr1 165598273 165598570 (82788758) C AluSx SINE/Alu (9) 303 2 276869
chr1165598674165598708(AT)n#Simple_repeat221+165598675165598708030,33,0-1,1,-127687119 22.1 0.0 0.0 chr1 165598676 165598708 (82788620) + (AT)n Simple_repeat 1 33 (0) 276871
chr1165598707165599072MER47A#DNA/TcMar-Tigger163+165598708165599069030,361,3-1,1,-12768722111 16.3 0.6 0.0 chr1 165598709 165599069 (82788259) + MER47A DNA/TcMar-Tigger 1 363 (3) 276872

RepeatMasker (hub_567047_t2tRepeatMasker) Track Description
 

Description

Repetitive genomic elements including Transposable Element (TE) families, Satellite, Short Tandem Repeats, and low complexity DNA as annotated by RepeatMasker. These tracks were constructed with the NCBI BLAST-derived search engine RMBlast and Dfam 3.3 database (plus T2T-CHM13-derived entries submitted to the Dfam 3.6 data release in April 2022, and HG002 chrY-derived entries not yet submitted).

Individual tracks are identified using the three main components of the analysis, the version of RepeatMasker, the search engine used, and finally the repeat library version.

  • RepeatMasker version April 01 2021 open-4.1.2-p1
  • Search Engine: RMBlast (-e ncbi) [ 2.10.0+ (March 2020) ]
  • RepeatMasker Database: Dfam_3.3 (plus T2T-CHM13-derived entries submitted to the Dfam 3.6 data release in April 2022, and HG002 chrY-derived entries not yet submitted)

Display Conventions and Configuration

Context Sensitive Zooming

This track employs a technique which chooses the appropriate visual representation for the data based on the zoom scale, and or the number of annotations currently in view. The track will automatically switch from the most detailed visualization ('Full' mode) to the denser view ('Pack' mode) when the window size is greater than 45kb of sequence. It will further switch to the even denser single line view ('Dense' mode) if more than 500 annotations are present in the current view.

Dense Mode Visualization

In dense display mode, a single line is displayed denoting the coverage of repeats using a series of colored boxes. The boxes are colored based on the classification of the repeat (see below for legend).

Pack Mode Visualization

In pack mode, repeats are represented as sets of joined features. These are color coded as above based on the class of the repeat, and the further details such as orientation (denoted by chevrons) and a family label are provided. This family label may be optionally turned off in the track configuration.



The pack display mode may also be configured to resemble the original UCSC repeat track. In this visualization repeat features are grouped by classes (see below), and displayed on seperate track lines. The repeat ranges are denoted as grayscale boxes, reflecting both the size of the repeat and the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading.

Full Mode Visualization

In the most detailed visualization repeats are displayed as chevron boxes, indicating the size and orientation of the repeat. The interior grayscale shading represents the divergence of the repeat (see above) while the outline color represents the class of the repeat. Dotted lines above the repeat and extending left or right indicate the length of unaligned repeat model sequence and provide context for where a repeat fragment originates in its consensus or pHMM model. If the length of the unaligned sequence is large, an iterruption line and bp size is indicated instead of drawing the extension to scale.

For example, the following repeat is a SINE element in the forward orientation with average divergence. Only the 5' proximal fragment of the consensus sequence is aligned to the genome. The 3' unaligned length (384bp) is not drawn to scale and is instead displayed using a set of interruption lines along with the length of the unaligned sequence.

Repeats that have been fragmented by insertions or large internal deletions are now represented by join lines. In the example below, a LINE element is found as two fragments. The solid connection lines indicate that there are no unaligned consensus bases between the two fragments. Also note these fragments form the 3' extremity of the repeat, as there is no unaligned consensus sequence following the last fragment.

In cases where there is unaligned consensus sequence between the fragments, the repeat will look like the following. The dotted line indicates the length of the unaligned sequence between the two fragments. In this case the unaligned consensus is longer than the actual genomic distance between these two fragments.

If there is consensus overlap between the two fragments, the joining lines will be drawn to indicate how much of the left fragment is repeated in the right fragment.

The following table lists the repeat class colors:

Color Repeat Class
SINE - Short Interspersed Nuclear Element
LINE - Long Interspersed Nuclear Element
LTR - Long Terminal Repeat
DNA - DNA Transposon
Simple - Single Nucleotide Stretches and Tandem Repeats
Low_complexity - Low Complexity DNA
Satellite - Satellite Repeats
RNA - RNA Repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)
Other - Other Repeats (including class RC - Rolling Circle)
Unknown - Unknown Classification

A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed.

Methods

The RepeatMasker (www.repeatmasker.org) tool was used to generate the datasets found on this track hub.

References

Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010.

For the discovery of the additional T2T-CHM13-derived repeats included in this track, as well as the methods (and scripts) for masking the assembly with these T2T-CHM13-derived repeats and previously known repeats:

Hoyt SJ, et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. bioRxiv. 2022 Apr 1.

Hoyt SJ, et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements analysis code: T2T-CHM13. bioRxiv. 2022 Apr 1.

Dfam is described in:

Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, Smit AF, Finn RD. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 2013 Jan;41(Database issue):D70-82. PMID: 23203985; PMC: PMC3531169

Repbase Update is described in:

Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072

For a discussion of repeats in mammalian genomes, see:

Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616

Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846