TOGA annotations using chicken/galGal6 as reference (SP4_rna-XM_004939343.3.1)
 

TOGA gene annotation

Projection SP4_rna-XM_004939343.3.1


Reference transcript: SP4_rna-XM_004939343.3
Genomic locus in reference: chr2:30405445-30430871
Genomic locus in query: JABBVY010000002:13253664-13276979
Projection classification: Intact
Probability that query locus is orthologous: 0.9963686466217041
Show features used for ortholog probability
  • Synteny (log10 value): 398
  • Global CDS fraction: 0.02385595521683938
  • Local CDS fraction: 0.12816666666666668
  • Local intron fraction: 0.8041506533435818
  • Local CDS coverage: 0.9961139896373057
  • Flank fraction: 0.734

Feature description: For each projection (one reference transcript and one overlapping chain), TOGA computes the following features by intersecting the reference coordinates of aligning blocks in the chain with different gene parts (coding exons, UTR (untranslated region) exons, introns) and the respective intergenic regions.
We define the following variables:
  • c: number of reference bases in the intersection between chain blocks and coding exons of the gene under consideration.
  • C: number of reference bases in the intersection between chain blocks and coding exons of all genes.
  • a: number of reference bases in the intersection between chain blocks and coding exons and introns of the gene under consideration.
  • A: number of reference bases in the intersection between chain blocks and coding exons and introns of all genes and the intersection between chain blocks and intergenic regions (excludes UTRs).
  • f: number of reference bases in chain blocks overlapping the 10 kb flanks of the gene under consideration. Alignment blocks overlapping exons of another gene that is located in these 10 kb flanks are ignored.
  • i: number of reference bases in the intersection between chain blocks and introns of the gene under consideration.
  • CDS (coding sequence): length of the coding region of the gene under consideration.
  • I: sum of all intron lengths of the gene under consideration.
Using these variables, TOGA computes the following features:
  • "global CDS fraction" as C / A. Chains with a high value have alignments that largely overlap coding exons,which is a hallmark of paralogous or processed pseudogene chains. In contrast, chains with a low value also align many intronic and intergenic regions, which is a hallmark of orthologous chains.
  • "local CDS fraction" as c / a. Orthologous chains tend to have a lower value, as intronic regions partially align. This feature is not computed for single-exon genes.
  • "local intron fraction" as i / I. Orthologous chains tend to have a higher value.This feature is not computed for single-exon genes.
  • "flank fraction" as f / 20,000. Orthologous chains tend to have higher values,as flanking intergenic regions partially align. This feature is important to detect orthologous loci of single-exon genes.
  • "synteny" as log10 of the number of genes, whose coding exons overlap by at least one base aligningblocks of this chain. Orthologous chains tend to cover several genes located in a conserved order, resulting in higher synteny values.
  • "local CDS coverage" as c / CDS, which is only used for single-exon genes.


Visualization of inactivating mutations on exon-intron structure

none SP SP4_rna-XM_004939343.3.1

Exons shown in grey are missing (often overlap assembly gaps). Exons shown in red or blue are deleted or do not align at all. Red indicates that the exon deletion shifts the reading frame, while blue indicates that exon deletion(s) are framepreserving.
Show features used for transcript classification
  • Percent intact, ignoring missing sequence: 0.9961139896373057
  • Percent intact, treating missing as intact sequence: 0.9961139896373057
  • Proportion of intact codons: 0.9961139896373057
  • Percent of CDS not covered by this chain (0 unless the chain covers only a part of the gene): 0.0
  • Middle 80 percent of CDS intact: Yes
  • Middle 80 percent of CDS present: Yes


Predicted protein sequence


Show protein sequence of query
MPTEGEKSPEAENNNNNNKKGKTGGSQDSQPSPLALLAATCSKIGTPGENQGTGQQQIIIDPNQGLVQLQNQPQQLELVT
TQLAGNAWQLVAAAPSASKDNNVAQQGSSVASSAASPSSSNNGSASPSKTKSGNSSTTTPGQFQVIQVQNPSGSVQYQVI
PQIQTTEGQINPSNATGLQDIQGQIQLIPAGNNQAILTTANRTASGNVIAQNLANQTVPVQIRPGVSIPLQLQTIPGTQA
QVVTTLPINIGGVTLALPVINNVATGGSSGQVGQSTESGVSNGNQLASTPVTSASGSTMPESPSSSSTATTTASTSLTSS
DTLVSSAETGQYTSTPGSSSEQASEEPQTTATDSEAQSSSQLQSNGLQNVQDQSGSLQQVQIVGQPILQQIQIQQPQQQI
IQAIPPQSFQLQSGQTIQTIQQQPLQNVQLQAVSPTQVLIRAPTLTPSGQISWQTVQVQNLQSLSNLQVQNAGLPQQLTI
TPVSSSGGTTIAQIAPVAVAGTPITLNAAQLASVPNLQTVSVANLSAAGVQVQGVPVTITSVAGQQQGQDGVKVQQATIA
PVTVAVGGIANAAIGAVSPDQITQVQLQQAQQASDQEVQPGKRLRRVACSCPNCREGEGRGSNEPGKKKQHICHIEGCGK
VYGKTSHLRAHLRWHTGERPFVCNWIFCGKRFTRSDELQRHRRTHTGEKRFECPECSKRFMRSDHLSKHVKTHQNKKGGG
TALAIVTSGELDSSVTEVLGSPRIVTVAAISQDSNPATPNVSTNMEEF*

Protein sequence alignment


Show alignment between reference and query
ref: MPTEGEKSPEAENNNNNNKKGKTGGSQDSQPSPLALLAATCSKIGTPGENQATGQQQIIIDPNQGLVQLQNQPQQLELVT
     ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||
que: MPTEGEKSPEAENNNNNNKKGKTGGSQDSQPSPLALLAATCSKIGTPGENQGTGQQQIIIDPNQGLVQLQNQPQQLELVT

ref: TQLAGNSWQLVAASPSASKDNNVAQQGSSVASSAPSPSSSNNGSASPTKTKSGNSSATTPGQFQVIQVQNPSGSVQYQVI
     |||||| |||||| |||||||||||||||||||| |||||||||||| |||||||| |||||||||||||||||||||||
que: TQLAGNAWQLVAAAPSASKDNNVAQQGSSVASSAASPSSSNNGSASPSKTKSGNSSTTTPGQFQVIQVQNPSGSVQYQVI

ref: PQIQTTEGQQIQINPANATGLQDIQGQIQLIPAGNNQAILTASNRTASGNIIAQNLANQTVPVQIRPGVSIPLQLQTIPG
     ||||||||   |||| |||||||||||||||||||||||||  ||||||| |||||||||||||||||||||||||||||
que: PQIQTTEG---QINPSNATGLQDIQGQIQLIPAGNNQAILTTANRTASGNVIAQNLANQTVPVQIRPGVSIPLQLQTIPG

ref: TQAQVVTTLPINIGGVTLALPVINNMAAGGGSGQVGQSTEGGVSNGSQLASTPVTSASVSSMPDSPSSSSTSTTTASTSL
     ||||||||||||||||||||||||| | || ||||||||| ||||| ||||||||||| | || ||||||| ||||||||
que: TQAQVVTTLPINIGGVTLALPVINNVATGGSSGQVGQSTESGVSNGNQLASTPVTSASGSTMPESPSSSSTATTTASTSL

ref: TSSDTLVSSAETGQYTSTAGSSSEQPTEESQTTATDSEAQSSSQLQSNGLQNVQDQSGSLQQVQIVGQPILQQIQIQQPQ
     |||||||||||||||||| ||||||  || ||||||||||||||||||||||||||||||||||||||||||||||||||
que: TSSDTLVSSAETGQYTSTPGSSSEQASEEPQTTATDSEAQSSSQLQSNGLQNVQDQSGSLQQVQIVGQPILQQIQIQQPQ

ref: QQIIQAIPPQSFQLQSGQTIQTIQQQSLQNVQLQAVSPTQVLIRAPTLTPSGQISWQTVQVQNLQSLSNLQVQNAGLPQQ
     |||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||
que: QQIIQAIPPQSFQLQSGQTIQTIQQQPLQNVQLQAVSPTQVLIRAPTLTPSGQISWQTVQVQNLQSLSNLQVQNAGLPQQ

ref: LTITPVSSSGGTTIAQIAPVAVAGTPITLNAAQLASVPNLQTVSVANLGAAGVQVQGVPVTITSVAGQQQGQDGVKVQQA
     |||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||
que: LTITPVSSSGGTTIAQIAPVAVAGTPITLNAAQLASVPNLQTVSVANLSAAGVQVQGVPVTITSVAGQQQGQDGVKVQQA

ref: TIAPVTVAVGGIANAGIGAVSPDQITQVQLQQAQQASDQEVQPGKRTRRVACSCPNCREGEGRSSNEPGKKKQHICHIEG
     ||||||||||||||| |||||||||||||||||||||||||||||| |||||||||||||||| ||||||||||||||||
que: TIAPVTVAVGGIANAAIGAVSPDQITQVQLQQAQQASDQEVQPGKRLRRVACSCPNCREGEGRGSNEPGKKKQHICHIEG

ref: CGKVYGKTSHLRAHLRWHTGERPFICNWVFCGKRFTRSDELQRHRRTHTGEKRFECPECSKRFMRSDHLSKHVKTHQNKK
     |||||||||||||||||||||||| ||| |||||||||||||||||||||||||||||||||||||||||||||||||||
que: CGKVYGKTSHLRAHLRWHTGERPFVCNWIFCGKRFTRSDELQRHRRTHTGEKRFECPECSKRFMRSDHLSKHVKTHQNKK

ref: GGGTALAIVTSGELDSSVTEVLGSPRIVTVAAISQDSNPATPNVSTNMEEF*
     ||||||||||||||||||||||||||||||||||||||||||||||||||||
que: GGGTALAIVTSGELDSSVTEVLGSPRIVTVAAISQDSNPATPNVSTNMEEF*




List of inactivating mutations


Show inactivating mutations

Exon numberCodon numberMutation classMutationTreated as inactivatingMutation ID


Exon alignments


Show exon sequences and features

Exon number: 1

Exon region: JABBVY010000002:13276979-13276898
Nucleotide percent identity: 100.00 | BLOSUM: 100.00
Intersects assembly gaps: NO
Exon alignment class: A+
Detected within expected region (exp:13276744-13277143): YES

Sequence alignment between reference and query exon:
ref: ATGCCCACAGAGGGAGAGAAATCTCCCGAGGCAGAGAATAACAACAATAATAATAAAAAAGGGAAAACTGGAGGCTCACA
     ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
que: ATGCCCACAGAGGGAGAGAAATCTCCCGAGGCAGAGAATAACAACAATAATAATAAAAAAGGGAAAACTGGAGGCTCACA

ref: G
     |
que: G


Exon number: 2

Exon region: JABBVY010000002:13276393-13274844
Nucleotide percent identity: 90.05 | BLOSUM: 94.99
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:13274819-13276476): YES

Sequence alignment between reference and query exon:
ref: GATTCTCAGCCTTCACCTCTCGCTTTGCTAGCAGCCACTTGCAGCAAAATAGGAACTCCTGGTGAGAACCAAGCAACTGG
     |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ||||||
que: GATTCTCAGCCTTCACCTCTCGCTTTGCTAGCAGCCACTTGCAGCAAAATAGGAACTCCTGGTGAGAATCAAGGAACTGG

ref: ACAGCAGCAGATTATTATAGATCCAAATCAAGGTTTGGTGCAGCTTCAAAATCAGCCACAACAGTTAGAATTAGTAACAA
     ||||||||||||||||||||| |||||||||||||||||||||||||| |||||||| || ||||| ||||| |||||||
que: ACAGCAGCAGATTATTATAGACCCAAATCAAGGTTTGGTGCAGCTTCAGAATCAGCCTCAGCAGTTGGAATTGGTAACAA

ref: CTCAGCTTGCTGGAAACTCTTGGCAGCTTGTTGCTGCATCTCCTTCTGCTTCAAAGGACAATAATGTTGCTCAACAGGGA
     ||||||| ||||||||| ||||||| || ||||| ||  |||||||||||||||| ||||||||||||||||| || |||
que: CTCAGCTAGCTGGAAACGCTTGGCAACTGGTTGCCGCTGCTCCTTCTGCTTCAAAAGACAATAATGTTGCTCAGCAAGGA

ref: TCTTCCGTTGCCTCAAGCGCACCAAGTCCCTCTAGCAGTAACAATGGAAGTGCATCTCCTACAAAAACCAAATCGGGCAA
     ||||| ||||||||||| ||| |||||||||| ||||||||||||||||||||||||||| ||||||||||||| |||||
que: TCTTCTGTTGCCTCAAGTGCAGCAAGTCCCTCCAGCAGTAACAATGGAAGTGCATCTCCTTCAAAAACCAAATCAGGCAA

ref: TTCTTCTGCAACAACCCCTGGACAATTTCAGGTCATTCAAGTACAAAATCCAAGTGGTAGTGTCCAATATCAAGTGATAC
      |||||| |||| |||||||||||||| || |||||||||||||| |||||||| ||||| ||||| || |||||||| |
que: CTCTTCTACAACGACCCCTGGACAATTCCAAGTCATTCAAGTACAGAATCCAAGCGGTAGCGTCCAGTACCAAGTGATTC

ref: CACAGATTCAGACAACAGAAGGTCAACAAATTCAAATTAATCCTGCAAATGCTACTGGTCTACAAGATATACAGGGTCAA
     ||||||||||||| || ||||||         ||||| |||||  | || ||||||||||||||||||||||||||||||
que: CACAGATTCAGACGACGGAAGGT---------CAAATCAATCCATCCAACGCTACTGGTCTACAAGATATACAGGGTCAA

ref: ATTCAGCTTATTCCTGCGGGAAATAATCAAGCTATCCTCACGGCTTCAAATAGGACAGCTTCGGGGAATATTATTGCTCA
     ||||||||||||||||| || ||||||||||||||||||||  || |||| |||||||||||||| ||| ||||||||||
que: ATTCAGCTTATTCCTGCAGGGAATAATCAAGCTATCCTCACAACTGCAAACAGGACAGCTTCGGGAAATGTTATTGCTCA

ref: AAACCTAGCAAATCAGACGGTCCCAGTCCAAATCAGGCCTGGTGTCTCCATACCACTGCAGCTGCAAACCATTCCTGGTA
     |||||||||||||||||| || ||||||||||| ||||| ||||| |||||||| ||||| |||||||| ||||||||||
que: AAACCTAGCAAATCAGACAGTTCCAGTCCAAATTAGGCCCGGTGTTTCCATACCGCTGCAACTGCAAACTATTCCTGGTA

ref: CTCAGGCACAGGTTGTGACAACTTTGCCTATAAACATTGGTGGAGTAACCCTAGCATTGCCTGTGATAAACAACATGGCA
     ||||||| || ||||| |||||  ||||||||||||||||||| |||||||| || ||||||||||| |||||| | || 
que: CTCAGGCGCAAGTTGTAACAACGCTGCCTATAAACATTGGTGGGGTAACCCTGGCTTTGCCTGTGATTAACAACGTTGCC

ref: GCTGGAGGAGGTTCAGGTCAAGTTGGCCAGTCCACAGAGGGTGGAGTTTCCAATGGAAGTCAGTTGGCATCTACACCGGT
      |||||||| |||| || |||||||||||||| || ||| |||||||||||||||||| |||| ||||||||||||| ||
que: ACTGGAGGAAGTTCCGGGCAAGTTGGCCAGTCTACGGAGAGTGGAGTTTCCAATGGAAATCAGCTGGCATCTACACCTGT

ref: CACTTCTGCCTCTGTTAGCTCAATGCCGGACTCTCCTTCTTCATCGTCCACTTCTACGACCACTGCTTCAACGTCTCTAA
     |||||||||||||| |||  | ||||| || |||||||| ||||| |||||  |||| |||||||| ||||||||||| |
que: CACTTCTGCCTCTGGTAGTACCATGCCAGAGTCTCCTTCCTCATCTTCCACCGCTACAACCACTGCCTCAACGTCTCTGA

ref: CTAGCAGTGACACCTTAGTAAGCTCTGCAGAAACAGGCCAGTACACAAGCACAGCAGGCAGCAGTTCAGAGCAGCCAACG
     |||||||||||||  | ||||||||||||||||||||||| |||||||||||| |||||||||| ||||||||| |  | 
que: CTAGCAGTGACACACTGGTAAGCTCTGCAGAAACAGGCCAATACACAAGCACACCAGGCAGCAGCTCAGAGCAGGCGTCT

ref: GAAGAATCTCAAACAACTGCAACAGATTCTGAAGCCCAAAGCTCCAGTCAGCTTCAATCAAACGGACTACAGAATGTTCA
     |||||| ||||||||||||| ||||| || |||||||| ||||||||||||||||| || || ||||||||||| || ||
que: GAAGAACCTCAAACAACTGCTACAGACTCCGAAGCCCAGAGCTCCAGTCAGCTTCAGTCCAATGGACTACAGAACGTCCA

ref: GGATCAGTCAGGTTCCCTTCAGCAGGTACAGATTGTAGGTCAACCTATTCTACAGCAGATACAGATCCAGCAGCCTCAGC
     ||||||||||||||||||||| ||||| ||||| |||||||| |||||||| |||||||||||||||||||||||||| |
que: GGATCAGTCAGGTTCCCTTCAACAGGTCCAGATCGTAGGTCAGCCTATTCTGCAGCAGATACAGATCCAGCAGCCTCAAC

ref: AGCAAATTATACAGGCCATTCCTCCACAGTCATTTCAGCTCCAGTCAGGGCAAACTATACAGACCATTCAGCAGCAGTCT
     |||| ||||| |||||||||||||| |||||||||||||||||||||||||| |||||||||||||| ||||||||| ||
que: AGCAGATTATTCAGGCCATTCCTCCGCAGTCATTTCAGCTCCAGTCAGGGCAGACTATACAGACCATCCAGCAGCAGCCT

ref: TTGCAGAATGTTCAGCTGCAGGCAGTAAGTCCAACTCAGGTGCTCATCCGGGCTCCAACTTTAACACCATCAGGGCAGAT
     |||||||||||||||||||||||||| || |||||||||||||||||| ||||||||||||||||||||||||| |||||
que: TTGCAGAATGTTCAGCTGCAGGCAGTGAGCCCAACTCAGGTGCTCATCAGGGCTCCAACTTTAACACCATCAGGACAGAT

ref: CAGTTGGCAGACTGTACAGGTTCAGAATCTGCAAAGCCTTTCAAATCTTCAAGTTCAGAATGCTGGGTTACCCCAGCAAC
     ||| ||||| ||||| |||||||||||||||||||||||||| ||||| |||||||| ||||||||||||||||| ||||
que: CAGCTGGCAAACTGTGCAGGTTCAGAATCTGCAAAGCCTTTCCAATCTGCAAGTTCAAAATGCTGGGTTACCCCAACAAC

ref: TCACCATTACCCCTGTGTCTTCAAGTGGTGGCACAACCATTGCCCAGATTGCTCCAGTGGCTGTTGCTGGTACCCCAATC
     | |||||||| |||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||| |||
que: TAACCATTACTCCTGTGTCTTCAAGTGGTGGCACAACCATTGCCCAGATTGCTCCGGTGGCTGTTGCTGGTACCCCCATC

ref: ACCCTGAATGCTGCCCAGCTTGCTTCAGTACCTAATCTTCAAACAGTAAGTGTTGCCAACCTGGGTGCTGCAGGTGTTCA
     || ||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||| ||||||||||||||||
que: ACTCTGAATGCTGCCCAGCTTGCTTCAGTACCTAATCTTCAAACTGTAAGTGTTGCCAACCTGAGTGCTGCAGGTGTTCA

ref: AGTTCAAGGAGTTCCAGTTACCATTACCAGTGTTGCAG
     ||||||||||||||| |||||||||||||| ||||| |
que: AGTTCAAGGAGTTCCTGTTACCATTACCAGCGTTGCGG


Exon number: 3

Exon region: JABBVY010000002:13255656-13255427
Nucleotide percent identity: 92.58 | BLOSUM: 96.87
Intersects assembly gaps: NO
Exon alignment class: A
Detected within expected region (exp:13255405-13255667): YES

Sequence alignment between reference and query exon:
ref: GTCAACAGCAAGGACAGGATGGAGTAAAAGTGCAGCAGGCTACCATAGCTCCTGTTACTGTAGCGGTAGGTGGCATTGCT
     ||||||||||||||||||||||||||||||| ||||| || || ||||| ||||| |||||||| |||||||||||||||
que: GTCAACAGCAAGGACAGGATGGAGTAAAAGTACAGCAAGCCACAATAGCGCCTGTAACTGTAGCAGTAGGTGGCATTGCT

ref: AATGCAGGAATTGGTGCTGTTAGTCCTGATCAGATAACACAAGTGCAGCTGCAACAAGCTCAACAAGCTTCTGACCAGGA
     ||||||| |||||||||||||||||| ||||||||||||||||||||||| ||||||||||||||||||||||| |||||
que: AATGCAGCAATTGGTGCTGTTAGTCCCGATCAGATAACACAAGTGCAGCTTCAACAAGCTCAACAAGCTTCTGATCAGGA

ref: AGTGCAACCTGGCAAGAGAACAAGAAGAGTTGCCTGCTCGTGTCCTAATTGCAGAGAGGGAGAAGGAAG
     |||||| ||||||||||||   ||||||||||||||||| |||||||| ||||||||||||||||||||
que: AGTGCAGCCTGGCAAGAGACTGAGAAGAGTTGCCTGCTCATGTCCTAACTGCAGAGAGGGAGAAGGAAG


Exon number: 4

Exon region: JABBVY010000002:13255193-13254993
Nucleotide percent identity: 94.00 | BLOSUM: 98.44
Intersects assembly gaps: NO
Exon alignment class: A+
Detected within expected region (exp:13254981-13255252): YES

Sequence alignment between reference and query exon:
ref: AAGCAGCAATGAACCAGGAAAAAAGAAGCAACATATCTGCCATATTGAAGGATGTGGTAAAGTTTATGGCAAGACATCTC
     | |||||||||| |||||||||||||| || |||||||||||||||||||||||||| ||||||||||||||||||||||
que: AGGCAGCAATGAGCCAGGAAAAAAGAAACAGCATATCTGCCATATTGAAGGATGTGGAAAAGTTTATGGCAAGACATCTC

ref: ATCTTCGGGCGCATCTGCGCTGGCATACTGGTGAAAGACCATTTATATGCAACTGGGTTTTTTGTGGCAAGCGATTTACA
     |||||||||| ||||||||||||||||||||||||||||||||| | ||||||||| ||||||||||||||||||| |||
que: ATCTTCGGGCACATCTGCGCTGGCATACTGGTGAAAGACCATTTGTGTGCAACTGGATTTTTTGTGGCAAGCGATTCACA

ref: AGGAGTGATGAGTTGCAAAGACATAGAAGAACCCACACAG
     |||||||||||||| ||||| |||||||||||||||||||
que: AGGAGTGATGAGTTACAAAGGCATAGAAGAACCCACACAG


Exon number: 5

Exon region: JABBVY010000002:13253912-13253664
Nucleotide percent identity: 96.37 | BLOSUM: 100.00
Intersects assembly gaps: NO
Exon alignment class: A+
Detected within expected region (exp:13253485-13253965): YES

Sequence alignment between reference and query exon:
ref: GTGAAAAGAGATTTGAGTGCCCAGAATGCTCTAAAAGGTTTATGCGAAGTGATCATCTATCGAAACATGTCAAAACTCAT
     |||||||||||||||| ||||||||||| ||||||||||||||||||||||| ||||| || ||||||||||||||||||
que: GTGAAAAGAGATTTGAATGCCCAGAATGTTCTAAAAGGTTTATGCGAAGTGACCATCTCTCAAAACATGTCAAAACTCAT

ref: CAGAACAAGAAGGGTGGTGGAACAGCCCTTGCTATTGTTACCTCAGGAGAACTGGACTCTTCAGTTACTGAGGTTCTTGG
     || |||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||
que: CAAAACAAGAAGGGTGGTGGAACAGCCCTCGCTATTGTTACCTCAGGAGAACTGGACTCTTCAGTTACTGAGGTTCTTGG

ref: TTCTCCAAGAATTGTCACTGTTGCTGCCATTTCTCAAGATTCAAACCCAGCAACCCCTAATGTTTCAACAAACATGGAAG
     |||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||
que: TTCTCCAAGAATTGTCACAGTTGCTGCCATTTCTCAAGATTCAAACCCAGCAACCCCTAACGTTTCAACAAACATGGAAG

ref: AATTCTGA
     ||||||||
que: AATTCTGA




Data schema/format description and download

Go to TOGA vs. galGal6 track controls


Description

TOGA (Tool to infer Orthologs from Genome Alignments) is a homology-based method that integrates gene annotation, inferring orthologs and classifying genes as intact or lost.

This track has 43,972 items in the track, covering 443,389,332 bases in the sequence which is % 42.63 of the total sequence of size 1,039,966,254 nucleotides.

Methods

As input, TOGA uses a gene annotation of a reference species (human/hg38 for mammals, chicken/galGal6 for birds) and a whole genome alignment between the reference and query genome.

TOGA implements a novel paradigm that relies on alignments of intronic and intergenic regions and uses machine learning to accurately distinguish orthologs from paralogs or processed pseudogenes.

To annotate genes, CESAR 2.0 is used to determine the positions and boundaries of coding exons of a reference transcript in the orthologous genomic locus in the query species.

Display Conventions and Configuration

Each annotated transcript is shown in a color-coded classification as

  •   "intact": middle 80% of the CDS (coding sequence) is present and exhibits no gene-inactivating mutation. These transcripts likely encode functional proteins.
  •   "partially intact": 50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation. These transcripts may also encode functional proteins, but the evidence is weaker as parts of the CDS are missing, often due to assembly gaps.
  •   "missing": <50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation.
  •   "uncertain loss": there is 1 inactivating mutation in the middle 80% of the CDS, but evidence is not strong enough to classify the transcript as lost. These transcripts may or may not encode a functional protein.
  •   "lost": typically several inactivating mutations are present, thus there is strong evidence that the transcript is unlikely to encode a functional protein.

Clicking on a transcript provides additional information about the orthology classification, inactivating mutations, the protein sequence and protein/exon alignments.

Data Access

The data for this track is available from the bigBed file format with the command line access tool bigBedToBed available from the utilities download directory hgdownload.soe.ucsc.edu/admin/exe/linux_x86_64.

To extract from the bigBed file:

  bigBedToBed "https://hgdownload.soe.ucsc.edu/hubs/GCF/014/805/655/GCF_014805655.1/bbi/HLTOGAannotVsGalGal6v1.bb" togaData.bed
with the result in the togaData.bed file.

Credits

This data was prepared by the Michael Hiller Lab

References

The TOGA software is available from github.com/hillerlab/TOGA

Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales A, Ahmed AW, Kontopoulos DG, Hilgers L, Zoonomia Consortium, Hiller M. TOGA integrates gene annotation with orthology inference at scale. bioRxiv preprint September 2022