Schema for CHM13 unique - CHM13 unique in comparison to GRCh38/hg38 and GRCh37/hg19
  Database: hub_567047_hs1    Primary Table: hub_567047_hgUniqueHg38 Data last updated: 2022-04-09
Big Bed File Download: /gbdb/hs1/hgUnique/hgUnique.hg38.bb
Item Count: 615
The data is stored in the binary BigBed format.

Format description: Browser Extensible Data
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart103546781Start position in chromosome
chromEnd103735057End position in chromosome

Sample Rows
 
chromchromStartchromEnd
chr1103546781103735057
chr1108417231108482805
chr1108482833108516786
chr1120842894120845404
chr1121724719121735918
chr1121799856126005703
chr1126047009126301366
chr1126333003126337646
chr1126339702126353477
chr1126384724126390897

CHM13 unique for hg38 (hub_567047_hgUniqueHg38) Track Description
 

Description

These tracks show the regions unique to the T2T-CHM13 v2.0 assembly compared to the GRCh38/hg38 and GRCh37/hg19 reference assemblies.

Methods

    Converting a chain file to the PAF format

    We used the `to_paf.py` script from chaintools (https://doi.org/10.5281/zenodo.6342391, v0.1) to convert the v1_nfLO chains to the PAF format.

    Obtaining unique regions

    We used the follwing commands to obtain the regions unique to GRCh38/hg38 and GRCh37/hg19 in the BED format.

    
    cut -f 1,3,4 grch38-chm13v2.paf  \
      | bedtools sort -i - -g chm13v2.0.fasta.fai \
      | bedtools merge \
      | bedtools complement -g chm13v2.0.fasta.fai -i - \
      | bedtools merge \
      > T2T-CHM13v2.0_unique_regions_hg38.bed
    
    cut -f 1,3,4 hg19-chm13v2.paf |  bedtools sort -i - -g chm13v2.0.fasta.fai \
      | bedtools merge \
      | bedtools complement -g chm13v2.0.fasta.fai -i - \
      | bedtools merge \
      > T2T-CHM13v2.0_unique__regions_hg19.bed
    

Credits

The unique region annotations were generated by Nae-Chyun Chen<naechyun.chen@gmail.com> and Mitchell Vollger<mvollger@uw.edu>

References

Nurk S, Koren S, Rhie A, Rautiainen M, et al. The complete sequence of a human genome. bioRxiv, 2021.