Schema for CAT/Liftoff Genes - CAT + Liftoff Gene Annotations
  Database: hub_567047_hs1    Primary Table: hub_567047_catLiftOffGenesV1 Data last updated: 2022-03-16
Big Bed File Download: /gbdb/hs1/catLiftOffGenesV1/catLiftOffGenesV1.bb
Item Count: 234,903
The data is stored in the binary BigBed format.

Format description: bigCat gene models
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart165511939Start position in chromosome
chromEnd165685138End position in chromosome
nameAL596087.2-202Name
score0Score (0-1000)
strand++ or - for strand
thickStart165685138Start of where display should be thick (start codon)
thickEnd165685138End of where display should be thick (stop codon)
reserved85,212,76RGB value (use R,G,B string in input file)
blockCount3Number of blocks
blockSizes144,138,191Comma separated list of block sizes
chromStarts0,169436,173008Start positions relative to chromStart
name2AL596087.2Gene name
cdsStartStatnoneStatus of CDS start annotation
cdsEndStatnoneStatus of CDS end annotation
exonFrames-1,-1,-1Exon frame {0,1,2}, or -1 if no frame for exon
txIdCHM13_T0014705Transcript ID
typelncRNATranscript type
geneNameCHM13_G0003761Gene ID
geneTypelncRNAGene type
sourceGeneENSG00000229588.2Source gene ID
sourceTranscriptENST00000653824.1Source transcript ID
alignmentIdENST00000653824.1-0Alignment ID
alternativeSourceTranscriptsN/AAlternative source transcripts
ParalogynanParalogous alignment IDs
UnfilteredParalogynanUnfiltered paralogous alignment IDs
collapsedGeneIdsN/ACollapsed Gene IDs
collapsedGeneNamesN/ACollapsed Gene Names
frameshiftnanFrameshifted relative to source?
exonAnnotationSupport1,1,1Exon support in reference annotation
intronAnnotationSupport1,1Intron support in reference annotation
transcriptClassorthologTranscript class
transcriptModestransMapTranscript mode(s)
validStartTrueValid start codon
validStopTrueValid stop codon
properOrfTrueProper multiple of 3 ORF
extra_paralogFalseExtra paralog of gene?

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStartsname2cdsStartStatcdsEndStatexonFramestxIdtypegeneNamegeneTypesourceGenesourceTranscriptalignmentIdalternativeSourceTranscriptsParalogyUnfilteredParalogycollapsedGeneIdscollapsedGeneNamesframeshiftexonAnnotationSupportintronAnnotationSupporttranscriptClasstranscriptModesvalidStartvalidStopproperOrfextra_paralog
chr1165511939165685138AL596087.2-2020+16568513816568513885,212,763144,138,1910,169436,173008AL596087.2nonenone-1,-1,-1CHM13_T0014705lncRNACHM13_G0003761lncRNAENSG00000229588.2ENST00000653824.1ENST00000653824.1-0N/AnannanN/AN/Anan1,1,11,1orthologtransMapTrueTrueTrueFalse
chr1165621672165623641AL596087.1-2010-165623641165623641255,51,255119690AL596087.1nonenone-1CHM13_T0014706processed_pseudogeneCHM13_G0003762processed_pseudogeneENSG00000215835.2ENST00000400979.2ENST00000400979.2-0N/AnannanN/AN/Anan1nanorthologtransMapTrueTrueTrueFalse
chr1165680931165681810AL596087.2-2010+16568181016568181085,212,76380,138,2050,444,674AL596087.2nonenone-1,-1,-1CHM13_T0014707lncRNACHM13_G0003761lncRNAENSG00000229588.2ENST00000425271.1ENST00000425271.1-0N/AnannanN/AN/Anan1,1,11,1orthologtransMapTrueTrueTrueFalse
chr1165733771165798688AL583804.1-2010-16579868816579868885,212,763634,105,710,64588,64846AL583804.1nonenone-1,-1,-1CHM13_T0014708lncRNACHM13_G0003763lncRNAENSG00000225325.1ENST00000448643.1ENST00000448643.1-0N/AnannanN/AN/Anan1,1,11,1orthologtransMapTrueTrueTrueFalse
chr1165820713165827534FMO7P-2010+165827534165827534255,51,2554138,148,156,3260,1391,3463,6495FMO7Pnonenone-1,-1,-1,-1CHM13_T0014709unprocessed_pseudogeneCHM13_G0003764unprocessed_pseudogeneENSG00000230231.1ENST00000436045.1ENST00000436045.1-0N/AnannanN/AN/Anan1,1,1,11,1,1orthologtransMapTrueTrueTrueFalse
chr1165820847165836063LINC01675-2020-16583606316583606385,212,7631326,89,2360,2002,14980LINC01675nonenone-1,-1,-1CHM13_T0014710lncRNACHM13_G0003765lncRNAENSG00000234142.2ENST00000662326.1ENST00000662326.1-0N/AnannanN/AN/Anan1,1,11,1orthologtransMapTrueTrueTrueFalse
chr1165821842165836274LINC01675-2010-16583627416583627485,212,764331,89,166,4470,1007,8415,13985LINC01675nonenone-1,-1,-1,-1CHM13_T0014711lncRNACHM13_G0003765lncRNAENSG00000234142.2ENST00000426519.2ENST00000426519.2-0N/AnannanN/AN/Anan1,1,1,11,1,1orthologtransMapTrueTrueTrueFalse
chr1165912094165926621FMO8P-2010+165926621165926621255,51,2558135,182,163,143,200,351,79,3470,3775,4399,7456,8421,11119,12943,14180FMO8Pnonenone-1,-1,-1,-1,-1,-1,-1,-1CHM13_T0014712unprocessed_pseudogeneCHM13_G0003766unprocessed_pseudogeneENSG00000238087.3ENST00000434461.1ENST00000434461.1-0N/AnannanN/AN/Anan1,1,1,1,1,1,1,11,1,1,1,1,1,1orthologtransMapTrueTrueTrueFalse
chr1165949824165971136FMO9P-2010+16597113616597113685,212,76769,98,144,189,163,143,5580,843,8533,16872,17943,19717,20754FMO9Pnonenone-1,-1,-1,-1,-1,-1,-1CHM13_T0014713processed_transcriptCHM13_G0003767transcribed_unprocessed_pseudogeneENSG00000215834.10ENST00000477875.6ENST00000477875.6-0N/AnannanN/AN/Anan1,1,1,1,1,1,11,1,1,1,1,1orthologtransMapTrueTrueTrueFalse
chr1165958367165977277FMO9P-2020+165977277165977277255,51,2558134,189,163,143,200,353,73,2800,8329,9400,11174,12818,15991,16489,18630FMO9Pnonenone-1,-1,-1,-1,-1,-1,-1,-1CHM13_T0014714transcribed_unprocessed_pseudogeneCHM13_G0003767transcribed_unprocessed_pseudogeneENSG00000215834.10ENST00000488458.1ENST00000488458.1-0N/AnannanN/AN/Anan1,1,1,1,1,1,1,11,1,1,1,1,1,1orthologtransMapTrueTrueTrueFalse

CAT/Liftoff Genes (hub_567047_catLiftOffGenesV1) Track Description
 

Description

This track represents the gene models for the T2T CHM13 assembly generated using the CAT (Compartive Annotation Toolkit) software with genes that CAT could not be mapped as well as novel paralogs, filled in from the LiftOff mappings. The reference annotations are from GENCODE V35.

Display Conventions and Configuration

This track follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. Gene names are displayed in 'pack' or 'full' mode. More information about each gene can be found by clicking on the specific gene/transcript model.

The following color key is used:

  • Blue: protein coding
  • Green: non-coding
  • Pink: pseudogenes

Methods

This tracks combines gene annotations generated by two methods. First the Comparative Annotation Toolkit (CAT) was used to Liftoff was then used as a second annotation method to map genes missed by CAT and additional gene paralogs.

Comparative Annotation Toolkit

Genome annotation for T2T CHM13 assembly was performed using Comparative Annotation Toolkit (CAT). CAT leverages whole-genome alignments generated by Cactus to transfer annotations from one source genome to one or more target genomes. For this annotation set, CAT lifted over the reference GENCODE v35 annotations onto the T2T genome. CAT also incorporated Iso-Seq data, first assembled into transcripts with StringTie2, to make the final consensus annotation set.

Liftoff

Liftoff uses Minimap2 to align reference gene DNA sequences to the target genome and selects the alignment(s) concordant with the intron/exon structure with the highest sequence identity. A minimum sequence identity of 95% was required to annotate gene paralogs. After running Liftoff, we identified genes that did not overlap any CAT annotations using bedtools intersect. These were combined with the CAT annotation to create the final annotation.

Credits

This track was provide by Marina Haukness <mhauknes@ucsc.edu> of UC Santa Cruz and Alaina Shumate <ashumat2@jhmi.edu> of Johns Hopkins University.

References

Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE et al. Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res. 2018 Jul;28(7):1029-1038. PMID: 29884752; PMC: PMC6028123

Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656

Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12. PMID: 15215400; PMC: PMC441517

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649

Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2020 Dec 15;. PMID: 33320174; PMC: PMC8289374