CAT Gene Annotations (PAX9-204)
 
Item: PAX9-204
Score: 896
Position: chr10:67796496-67810280
Genomic Size: 13785
Strand: +
View DNA for this feature (calJac4/Marmoset)

Gene namePAX9
Status of CDS start annotationnone
Status of CDS end annotationnone
Exon frame {0,1,2}, or -1 if no frame for exon-1,-1,-1
Transcript IDMarmoset_T0025802
Transcript typeretained_intron
Gene IDMarmoset_G0006968
Gene typeprotein_coding
Source gene IDENSG00000198807.13
Source transcript IDENST00000554201.1
Alignment IDENST00000554201.1-0
Alternative source transcriptsN/A
Paralogous alignment IDs
Unfiltered paralogous alignment IDs
Collapsed Gene IDsN/A
Collapsed Gene NamesN/A
Frameshifted relative to source?nan
Exon support in reference annotation1,1,1
Intron support in reference annotation2,1
Transcript classortholog
Transcript mode(s)transMap
Valid start codonTrue
Valid stop codonTrue
Proper multiple of 3 ORFTrue
RNA intron support0,0
RNA exon support0,0,0
Is this transcript supported by IsoSeq?False

Links to sequence:

View table schema

Go to CAT Genes track controls

Data last updated at UCSC: 2020-06-25 20:40:31

Description

This track represents the gene models for the Marmoset assembly, calJac4, generated using Comparative Annotation Toolkit (CAT).

CAT can be found on GitHub: https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit

Display Conventions and Configuration

This track follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. Gene names are displayed in 'pack' or 'full' mode. More information about each gene can be found by clicking on the specific gene/transcript model.

The following color key is used:

  • Blue: protein coding
  • Green: non-coding
  • Purple: novel predictions from augCGP/augPB

The following metadata is available by clicking on the transcript.

  • Gene name: gene symbolic name
  • Status of CDS start annotation: is the start of the CDS fully annotated
  • Status of CDS end annotation: is the end of the CDS fully annotated
  • Exon frame: comma-separate list of per-exon frames, 0,1,2, or -1 if no frame for an exon
  • Transcript ID: CAT assigned unique transcript id
  • Transcript type: type of transcript
  • Gene ID: CAT assigned unique gene id
  • Gene type: type of gene
  • Source gene ID: source GENCODE gene id
  • Source transcript ID: source GENCODE transcript id
  • Alignment ID: unique alignment id for TransMap based transcripts
  • Alternative source transcripts: ???
  • Paralogous alignment IDs: comma separated list of alignments identified as possible paralogs for this transcript
  • Unfiltered paralogous alignment IDs: paralogous alignments before filtering
  • Collapsed Gene IDs: if this gene was a part of a gene family collapse, this field reports the of identifiers of the genes collapsed together
  • Collapsed Gene Names: common names of the collapsed genes
  • Frameshifted relative to source?: is the transcript frameshifted relative to source_transcript/
  • Exon support in reference annotation: was this exon supported by the reference annotation?
  • Intron support in reference annotation: was this intron exon supported by the reference annotation?
  • Transcript class: for projection transcripts, it is maybe ortholog; for de-novo transcripts, will be one of poor_alignment, possible_paralog, putative_novel_isoform, or putative_novel
  • Transcript mode(s): Comma separated list of transcript modes
  • Valid start codon: has a valid start-codon
  • Valid stop codon: has a valid stop-codon
  • Proper multiple of 3 ORF: is there a full valid st
  • RNA intron support: RNA-Seq support introns
  • RNA exon support: RNA-Seq support exons
  • Is this transcript supported by IsoSeq?: True if the transcript is supported by Iso-Seq

Methods

Genome annotation for the calJac4 assembly was performed using Comparative Annotation Toolkit (CAT). CAT leverages whole-genome alignments generated by Cactus to transfer annotations from one source genome to one or more target genomes. CAT also runs AUGUSTUS in both the comparative gene prediction mode and in a single-genome mode that utilizes Iso-Seq data to predict alternative isoforms. CAT then combines all of these annotation methods into a final consensus annotation set that represents orthology relationships as well as species-specific information.

A Cactus alignment was produced with nine primate assemblies and mouse as an outgroup. The GENCODE v33 annotation for GRCh38 was used as a reference to map onto the calJac4 assembly. 20 IsoSeq libraries were included as evidence for the CAT annotation pipeline.

Assemblies used in Cactus alignment
Bonobo Pan paniscus Mhudiblu_PPA_v0 GCA_013052645.1
Chimp Pan troglodytes Clint_PTRv2 GCF_002880755.1
Gibbon Nomascus leucogenys Asia_NLE_v1 GCF_006542625.1
Gorilla Gorilla gorilla gorilla Kamilah_GGO_v0 GCF_008122165.1
Human Homo sapiens GRCh38 GCA_000001405
Marmoset Callithrix jacchus Callithrix_jacchus_cj1700_1.0 GCA_009663435.1
Orangutan Pongo abelii Susie_PABv2 GCF_002880775.1
Rhesus Macaca mulatta Mmul_10 GCF_003339765.1
Owl_monkey     unsubmitted
Mouse Mus musculus GRCm38 GCA_000001635.2

Data Access

For automated analysis, the genome annotation is stored in a bigGenePred format file that can be downloaded from the download server at cat-consensus-v2.bb. Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example:
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/calJac4/cat/cat-consensus-v2.bb -chrom=chr6 -start=0 -end=1000000 stdout
Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

Credits

The alignments were generated by, Marina Haukness, Mark Diekhans, and Ian Fiddes.

References

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649

Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE et al. Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res. 2018 Jul;28(7):1029-1038. PMID: 29884752; PMC: PMC6028123

Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12. PMID: 15215400; PMC: PMC441517