Schema for TOGA vs. hg38 - TOGA annotations using human/hg38 as reference
  Database: mm10    Primary Table: HLTOGAannotvHg38v1 Data last updated: 2022-06-20
Big Bed File Download: /gbdb/mm10/TOGAvHg38v1/HLTOGAannotVsHg38v1.bb
Item Count: 54,570
The data is stored in the binary BigBed format.

Format description: TOGA predicted gene model
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart130391493Start position in chromosome
chromEnd130462669End position in chromosome
nameENST00000367064.CD55.8Name or ID of item, ideally both human readable and unique
score1000Score (0-1000)
strand-+ or - for strand
thickStart130391493Start of where display should be thick (start codon)
thickEnd130462669End of where display should be thick (stop codon)
itemRgb255,160,120RGB value (use R,G,B string in input file)
blockCount9Number of blocks
blockSizes68,81,171,192,86,100,192,186,100,Comma separated list of block sizes
chromStarts0,56834,57593,60890,65327,66593,68088,70328,71076,Start positions relative to chromStart
ref_trans_idENST00000367064.CD55Reference transcript ID
ref_regionchr1:207321677-207360966Transcript region in the reference
query_regionchr1:130391493-130462669Region in the query
chain_score0.9961345195770264Chain orthology probability score
chain_synteny379Chain synteny log10 value
chain_flank0.1631Chain flank feature
chain_gl_cds_fract0.033950094004687945Chain global CDS fraction value
chain_loc_cds_fract0.1017703132092601Chain local CDS fraction value
chain_exon_cov0.9781849912739965Chain exon coverage value
chain_intron_cov0.2695986266655767Chain intron coverage value
statusUncertain LossGene loss classification
perc_intact_ign_M0.8350785340314136% intact ignoring missing
perc_intact_int_M0.8350785340314136% intact considering missing as intact
intact_codon_prop0.9607329842931938% intact codons
ouf_prop0.0% out of chain
mid_intact0Is middle 80% intact
mid_pres1Is middle 80% fully present
prot_alignmentref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG
     |        ||| |  |     || || | || | | | |||| ||| ||| | |     | |     | |   |   | 
que: MIRGRAPRTRPSPPPPL-----LP-LLSLSLLLLSPTVRGDCGPPPDIPNARPILGRHSKFAEQSKVAYSCNNGFKQVPD

ref: EKDSVICLKGSQWSDIEEFCNRSCEVPTRLNSASLKQPYITQNYFPVGTVVEYECRPGYRREPSLSPKLTCLQNLKWSTA
         | ||   |||  | ||  ||  | ||  ||||  |   | ||||| |||||||| |  | |  | |||  | ||  
que: KSNIVVCLENGQWSSHETFCEKSCVAPERLSFASLKKEYLNMNFFPVGTIVEYECRPGFRKQPPLPGKATCLEDLVWSPV

ref: VEFCKKKSCPNPGEIRNGQIDVPGGILFGATISFSCNTGYKLFGSTSSFCLISGSSVQWSDPLPECREIYCPAPPQIDNG
       ||||||||||    || |  | |||||  | |||| || | |  | ||   |  | | |  | | || || || | ||
que: AQFCKKKSCPNPKDLDNGHINIPTGILFGSEINFSCNPGYRLVGVSSTFCSVTGNTVDWDDEFPVCTEIHCPEPPKINNG

ref: IIQGERDHYGYRQSVTYACNKGFTMIGEHSIYCTVNN-DEGEWSGPPPECRG-----------KSLTSKVPPTVQKPTTV
     |  || | | | | ||| | |||   |  ||||||   | | || ||| |                    ||| ||  | 
que: IMRGESDSYTYSQVVTYSCDKGFILVGNASIYCTVSKSDVGQWSSPPPRCIAAPPKSQKPTKANNPSTAAPPTPQKTNTA

ref: NVPTTEVSPTSQKTTTK----TTTPNAQATRSTPVSRTT-KHFHETTPNKGSGTTSG---TTRLL--SGHTCF-TLTGLL
      ||  |  || ||| |     | ||  | |   ||  ||  |   |   ||   | |    | ||  |||||  ||| | 
que: DVPAAEIPPTPQKTNTADVPATETPTSQTTQHVPVTKTTVRHPIRTSTDKGEPNT-GPGASTQLLTLSGHTCLITLTVLH

ref: GTLVTMGLLT*
       |   | |||
que: AMLSLIGYLT*

HTML-formatted protein alignment
svg_line none SP ENST00000367064.CD55.8 aa SVG inactivating mutations visualization
ref_linkENST00000367064Reference transcript link
inact_mut_html_table
71SSM('gt', 'gc')->aaYESSSM_1
90Deleted exon-NODEL_1
HTML-formatted inactivating mutations table
exon_ali_html
Exon number: 1

Exon region: chr1:130462669-130462569
Nucleotide percent identity: 41.53 | BLOSUM: 30.12
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130462561-130462672): YES

Sequence alignment between reference and query exon:
ref: ATG---------------ACCGTCGCGCGGCCGAGCGTGCCCGCGGCGCTGCCCCTCCTCGGGGAGCTGCCCCGGCTGCT
     |||                |     | |||||      |||  |  |||||               |||||     ||||
que: ATGATCCGTGGGCGGGCGCCTAGGACTCGGCCATCACCGCCGCCTCCGCTG---------------CTGCCG---TTGCT

ref: GCTGCTGGTGCTGTTGTGCCTG---CCGGCCGTGTGGG
     |  ||||   ||||||   |||   ||  | ||  | |
que: GTCGCTGTCTCTGTTGCTGCTGTCCCCAACTGTACGCG


Exon number: 2

Exon region: chr1:130462007-130461821
Nucleotide percent identity: 56.45 | BLOSUM: 48.96
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130461806-130462025): YES

Sequence alignment between reference and query exon:
ref: GTGACTGTGGCCTTCCCCCAGATGTACCTAATGCCCAGCCAGCTTTGGAAGGCCGTACAAGTTTTCCCGAGGATACTGTA
     | ||||| ||||  || |||||  | |||||||||  ||||   ||||   | |   | |  ||| | ||| | |    |
que: GAGACTGCGGCCCACCTCCAGACATTCCTAATGCCAGGCCAATCTTGGGCAGACACTCCAAGTTTGCTGAGCAAAGCAAA

ref: ATAACGTACAAATGTGAAGAAAGCTTTGTGAAAATTCCTGGCGAGAAGGACTCAGTGATCTGCCTTAAGGGCAGTCAATG
      |  | |||   ||| |  |  |||||    || |||| | | ||    ||  |||  |||| ||| |     | |||||
que: GTGGCATACTCGTGTAATAACGGCTTTAAACAAGTTCCAGACAAGTCAAACATAGTTGTCTGTCTTGAAAATGGCCAATG

ref: GTCAGATATTGAAGAGTTCTGCAATC
     |||       |||   |||||  |  
que: GTCGAGCCACGAAACATTCTGTGAGA


Exon number: 3

Exon region: chr1:130459773-130459581
Nucleotide percent identity: 69.27 | BLOSUM: 64.24
Intersects assembly gaps: NO
Exon alignment class: A+
Detected within expected region (exp:130459571-130459788): YES

Sequence alignment between reference and query exon:
ref: GTAGCTGCGAGGTGCCAACAAGGCTAAATTCTGCATCCCTCAAACAGCCTTATATCACTCAGAATTATTTTCCAGTCGGT
          || |  |  |||  ||| || | || ||||||||||||| |    ||  |||    ||||| ||| ||||| |||
que: AATCATGTGTTGCTCCAGAAAGACTGAGTTTTGCATCCCTCAAAAAAGAGTACCTCAACATGAATTTTTTCCCAGTTGGT

ref: ACTGTTGTGGAATATGAGTGCCGTCCAGGTTACAGAAGAGAACCTTCTCTATCACCAAAACTAACTTGCCTTCAGAATTT
     ||| |||||||||||||||| || ||||| |   ||| | ||||| | ||  ||  ||||  |||||||||| || ||||
que: ACTATTGTGGAATATGAGTGTCGGCCAGGATTTCGAAAACAACCTCCACTCCCAGGAAAAGCAACTTGCCTTGAGGATTT

ref: AAAATGGTCCACAGCAGTCGAATTTTGTAAAA
     |  ||||||  |||  |   | ||||||||||
que: AGTATGGTCTCCAGTTGCTCAGTTTTGTAAAA


Exon number: 4

Exon region: chr1:130458186-130458086
Nucleotide percent identity: 72.00 | BLOSUM: 67.43
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130458044-130458198): YES

Sequence alignment between reference and query exon:
ref: AGAAATCATGCCCTAATCCGGGAGAAATACGAAATGGTCAGATTGATGTACCAGGTGGCATATTATTTGGTGCAACCATC
     | |||||||||||||||||   |||  |    |||||||| ||  |  |||||   ||||||||||| ||| ||   || 
que: AAAAATCATGCCCTAATCCTAAAGATCTGGATAATGGTCACATCAACATACCAACCGGCATATTATTCGGTTCAGAAATA

ref: TCCTTCTCATGTAACACAGG
       ||||||||| ||| ||||
que: AACTTCTCATGCAACCCAGG


Exon number: 5

Exon region: chr1:130456906-130456820
Nucleotide percent identity: 60.47 | BLOSUM: 52.29
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130456815-130456912): YES

Sequence alignment between reference and query exon:
ref: GTACAAATTATTTGGCTCGACTTCTAGTTTTTGTCTTATTTCAGGCAGCTCTGTCCAGTGGAGTGACCCGTTGCCAGAGT
     |||||   || | ||     | |||| ||| |||  | |  |||| |   ||||  | |||   ||   ||| |||| ||
que: GTACAGGCTAGTCGGTGTCTCCTCTACTTTCTGTTCTGTCACAGGAAATACTGTTGATTGGGACGATGAGTTTCCAGTGT

ref: GCAGAG
     ||| ||
que: GCACAG


Exon number: 6

Exon region: chr1:130452575-130452383
Nucleotide percent identity: 72.40 | BLOSUM: 62.71
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130452375-130452584): YES

Sequence alignment between reference and query exon:
ref: AAATTTATTGTCCAGCACCACCACAAATTGACAATGGAATAATTCAAGGGGAACGTGACCATTATGGATATAGACAGTCT
     ||||  |||||||||  |||||| ||||  ||||||||||||| | ||||||| |||||  ||||   ||||| |||   
que: AAATACATTGTCCAGAGCCACCAAAAATCAACAATGGAATAATGCGAGGGGAAAGTGACTCTTATACGTATAGCCAGGTG

ref: GTAACGTATGCATGTAATAAAGGATTCACCATGATTGGAGAGCACTCTATTTATTGTACTGTGAATAAT---GATGAAGG
     || || ||| ||||| | ||||| |||| | || ||||| |       ||||||||||||||||  ||    |||| |||
que: GTCACCTATTCATGTGACAAAGGCTTCATCCTGGTTGGAAATGCTAGCATTTATTGTACTGTGAGCAAGTCTGATGTAGG

ref: AGAGTGGAGTGGCCCACCACCTGAATGCAGAG
     | | |||||  | ||||||||    |||| ||
que: ACAATGGAGCAGTCCACCACCCCGGTGCATAG


Exon number: 7

Exon region: chr1:130449257-130449086
Nucleotide percent identity: 50.88 | BLOSUM: 41.35
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130449093-130449574): YES

Sequence alignment between reference and query exon:
ref: GA---------------------------------AAATCTCTAACTTCCAAGGTCCCACCAACAGTTCAGAAACCTACC
      |                                 ||   || | || |    | ||||||||||  ||||||| | | |
que: CAGCCCCACCAAAATCTCAGAAACCTACCAAAGCAAATAATCCATCTACAGCAGCCCCACCAACACCTCAGAAAACCAAC

ref: ACAGTAAATGTTCCAACTACAGAAGTCTCACCAACTTCTCAGAAAACCACCACAAAA------------ACCACCACACC
     |||| | |||| ||| || | ||| || |||||||  |||||||||||| ||||  |            ||    || ||
que: ACAGCAGATGTCCCAGCTGCCGAAATCCCACCAACACCTCAGAAAACCAACACAGCAGATGTCCCAGCTACAGAAACCCC

ref: AAATGCTCAAG
     ||   ||||| 
que: AACATCTCAAA


Exon number: 8

Exon region: chr1:130448408-130448327
Nucleotide percent identity: 53.57 | BLOSUM: 35.97
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130448286-130448441): YES

Sequence alignment between reference and query exon:
ref: CAACACGGAGTACACCTGTTTCCAGGACAACC---AAGCATTTTCATGAAACAACCCCAAATAAAGGAAGTGGAACCACT
     |||| | |  |  ||||||| ||| ||||||       |||        |||| |  || | ||||||      | ||| 
que: CAACCCAGCATGTACCTGTTACCAAGACAACAGTACGTCATCCAATAAGAACATCTACAGACAAAGGAGAGCCTAACACA

ref: TCAG
        |
que: ---G


Exon number: 9

Exon region: chr1:130428802-130428766
Nucleotide percent identity: 38.89 | BLOSUM: 51.85
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130447534-130447641): NO

Sequence alignment between reference and query exon:
ref: GT---------ACTACCCGTCTTCTA------TCTG
     |          | ||| |  || ||       ||||
que: GCCCTGGTGCCAGTACACAGCTGCTGACCTTGTCTG


Exon number: 10

Exon region: chr1:130391561-130391493
Nucleotide percent identity: 67.65 | BLOSUM: 46.67
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:130391474-130391583): YES

Sequence alignment between reference and query exon:
ref: GGCACACGTGTTTC---ACGTTGACAGGTTTGCTTGGGACGCTAGTAACCATGGGCTTGCTGACTTAG
     | || || |||||    || ||||||| ||||| || || ||||  |   || ||||   |||| |||
que: GACATACATGTTTAATAACCTTGACAGTTTTGCATGCGATGCTATCACTTATTGGCTACTTGACATAG


HTML-formatted exon alignment

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEnditemRgbblockCountblockSizeschromStartsref_trans_idref_regionquery_regionchain_scorechain_syntenychain_flankchain_gl_cds_fractchain_loc_cds_fractchain_exon_covchain_intron_covstatusperc_intact_ign_Mperc_intact_int_Mintact_codon_propouf_propmid_intactmid_presprot_alignmentsvg_lineref_linkinact_mut_html_tableexon_ali_html
chr1130391493130462669ENST00000367064.CD55.81000-130391493130462669255,160,120968,81,171,192,86,100,192,186,100,0,56834,57593,60890,65327,66593,68088,70328,71076,ENST00000367064.CD55chr1:207321677-207360966chr1:130391493-1304626690.99613451957702643790.16310.0339500940046879450.10177031320926010.97818499127399650.2695986266655767Uncertain Loss0.83507853403141360.83507853403141360.96073298429319380.001ref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG      ... ...ENST0000036706471SSM('gt', 'gc')->aaYESSSM_190Deleted ...Exon number: 1Exon region: chr1:130462669-130462569Nucleotide percent identity: 41.53 | BL ...
chr1130391493130462669ENST00000367067.CD55.81000-1303914931304626690,0,200968,475,278,192,86,100,192,186,100,0,56834,57645,60890,65327,66593,68088,70328,71076,ENST00000367067.CD55chr1:207321732-207359767chr1:130391493-1304626690.99636864662170413790.183750.0339500940046879450.146164321379936460.97398669086509380.2598640583554377Intact0.96551724137931040.96551724137931040.96551724137931040.011ref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG      ... ...ENST0000036706771SSM('gt', 'gc')->caNOSSM_180SSM ...Exon number: 1Exon region: chr1:130462669-130462569Nucleotide percent identity: 41.53 | BL ...
chr1130391493130462669ENST00000391921.CD55.81000-130391493130462669255,160,120868,81,171,192,86,100,186,100,0,56834,57593,60890,65327,66593,70328,71076,ENST00000391921.CD55chr1:207321642-207359713chr1:130391493-1304626690.99636864662170413790.18390.0339500940046879450.08433953699500680.97379454926624740.27340001626412946Uncertain Loss0.80188679245283020.80188679245283020.95283018867924530.001ref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG      ... ...ENST0000039192161SSM('gt', 'gc')->aaYESSSM_180Deleted ...Exon number: 1Exon region: chr1:130462669-130462569Nucleotide percent identity: 41.53 | BL ...
chr1130416838130418207ENST00000644836.CD55.2548761000-130416838130418207130,130,130286,100,0,1269,ENST00000644836.CD55chr1:207321747-207359876chr1:130416838-1304182070.689149081707000710.00.105084745762711860.105084745762711860.161038961038961040.043271594820521224Partial missing0.161038961038961041.01.00.838961038961038910ref: MTVAR-PSVPAALPLLGELPRLLLLVLLCLPAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDSV      ... ...ENST0000064483610Missing exon-YESMIS_120Missing exon ...Exon number: 1Exon region: chr1:130422958-130422864Nucleotide percent identity: 53.40 | BL ...
chr1130419600130419792ENST00000314754.CD55.3309811000-130419600130419792130,130,1301192,0,ENST00000314754.CD55chr1:207321700-207360501chr1:130419600-1304197920.63978749513626110.00.14534443603330810.14534443603330810.145124716553287990.030863016319947513Partial missing0.142857142857142850.7823129251700680.9843750.85487528344671200ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS      ... ...ENST0000031475410Missing exon-YESMIS_120Missing exon ...Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1130419600130419792ENST00000367063.CD55.3309811000-130419600130419792130,130,1301192,0,ENST00000367063.CD55chr1:207321531-207340766chr1:130419600-1304197920.673902332782745410.00.14534443603330810.14534443603330810.143820224719101140.06435248518011856Partial missing0.141573033707865170.78426966292134840.9843750.856179775280898800ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS      ... ...ENST0000036706310Missing exon-YESMIS_120Missing exon ...Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1130419600130419792ENST00000367064.CD55.3309811000-130419600130419792130,130,1301192,0,ENST00000367064.CD55chr1:207321677-207360966chr1:130419600-1304197920.63978749513626110.00.14534443603330810.14534443603330810.167539267015706820.030763781029455844Partial missing0.16492146596858640.74869109947643970.9843750.832460732984293200ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS      ... ...ENST0000036706410Missing exon-YESMIS_120Missing exon ...Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1130419600130419792ENST00000367067.CD55.3309811000-130419600130419792130,130,1301192,0,ENST00000367067.CD55chr1:207321732-207359767chr1:130419600-1304197920.63978749513626110.00.14534443603330810.14534443603330810.11615245009074410.0311947391688771Partial missing0.114337568058076220.82577132486388380.9843750.883847549909255800ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS      ... ...ENST0000036706710Missing exon-YESMIS_120Missing exon ...Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1130419600130419792ENST00000644836.CD55.3309811000-130419600130419792130,130,1301192,0,ENST00000644836.CD55chr1:207321747-207359876chr1:130419600-1304197920.63978749513626110.00.14534443603330810.14534443603330810.166233766233766230.030841938480030594Partial missing0.163636363636363640.75064935064935070.9843750.833766233766233700ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS      ... ...ENST0000064483610Missing exon-YESMIS_120Missing exon ...Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1130419600130419792ENST00000645323.CD55.3309811000-130419600130419792130,130,1301192,0,ENST00000645323.CD55chr1:207321642-207360336chr1:130419600-1304197920.63978749513626110.00.14534443603330810.14534443603330810.145454545454545450.03086048545812377Partial missing0.14318181818181820.78181818181818190.9843750.854545454545454500ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS      ... ...ENST0000064532310Missing exon-YESMIS_120Missing exon ...Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...

TOGA vs. hg38 (HLTOGAannotvHg38v1) Track Description
 

Description

TOGA (Tool to infer Orthologs from Genome Alignments) is a homology-based method that integrates gene annotation, inferring orthologs and classifying genes as intact or lost.

Methods

As input, TOGA uses a gene annotation of a reference species (human/hg38 for mammals, chicken/galGal6 for birds) and a whole genome alignment between the reference and query genome.

TOGA implements a novel paradigm that relies on alignments of intronic and intergenic regions and uses machine learning to accurately distinguish orthologs from paralogs or processed pseudogenes.

To annotate genes, CESAR 2.0 is used to determine the positions and boundaries of coding exons of a reference transcript in the orthologous genomic locus in the query species.

Display Conventions and Configuration

Each annotated transcript is shown in a color-coded classification as

  •   "intact": middle 80% of the CDS (coding sequence) is present and exhibits no gene-inactivating mutation. These transcripts likely encode functional proteins.
  •   "partially intact": 50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation. These transcripts may also encode functional proteins, but the evidence is weaker as parts of the CDS are missing, often due to assembly gaps.
  •   "missing": <50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation.
  •   "uncertain loss": there is 1 inactivating mutation in the middle 80% of the CDS, but evidence is not strong enough to classify the transcript as lost. These transcripts may or may not encode a functional protein.
  •   "lost": typically several inactivating mutations are present, thus there is strong evidence that the transcript is unlikely to encode a functional protein.

Clicking on a transcript provides additional information about the orthology classification, inactivating mutations, the protein sequence and protein/exon alignments.

Credits

This data was prepared by the Michael Hiller Lab

References

The TOGA software is available from github.com/hillerlab/TOGA

Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales A, Ahmed AW, Kontopoulos DG, Hilgers L, Zoonomia Consortium, Hiller M. TOGA integrates gene annotation with orthology inference at scale. bioRxiv preprint September 2022