CAT Gene Annotations (PAX9-206)
Item: PAX9-206
Score: 871
Position: chr10:67799856-67810229
Genomic Size: 10374
Strand: +
View DNA for this feature (calJac4/Marmoset)

Gene name PAX9
Status of CDS start annotation none
Status of CDS end annotation none
Exon frame {0,1,2}, or -1 if no frame for exon -1,-1,-1,-1
Transcript ID Marmoset_T0025803
Transcript type retained_intron
Gene ID Marmoset_G0006968
Gene type protein_coding
Source gene ID ENSG00000198807.13
Source transcript ID ENST00000557107.1
Alignment ID ENST00000557107.1-0
Alternative source transcripts N/A
Paralogous alignment IDs
Unfiltered paralogous alignment IDs
Collapsed Gene IDs N/A
Collapsed Gene Names N/A
Frameshifted relative to source? nan
Exon support in reference annotation 1,1,1,1
Intron support in reference annotation 1,1,1
Transcript class ortholog
Transcript mode(s) transMap
Valid start codon True
Valid stop codon True
Proper multiple of 3 ORF True
RNA intron support 0,0,0
RNA exon support 0,0,0,0
Is this transcript supported by IsoSeq? False

Links to sequence:

Data schema/format description and download

Go to CAT Genes track controls

Data last updated at UCSC: 2020-06-25 20:40:31


This track represents the gene models for the Marmoset assembly, calJac4, generated using Comparative Annotation Toolkit (CAT).

CAT can be found on GitHub:

Display Conventions and Configuration

This track follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. Gene names are displayed in 'pack' or 'full' mode. More information about each gene can be found by clicking on the specific gene/transcript model.

The following color key is used:

  • Blue: protein coding
  • Green: non-coding
  • Purple: novel predictions from augCGP/augPB

The following metadata is available by clicking on the transcript.

  • Gene name: gene symbolic name
  • Status of CDS start annotation: is the start of the CDS fully annotated
  • Status of CDS end annotation: is the end of the CDS fully annotated
  • Exon frame: comma-separate list of per-exon frames, 0,1,2, or -1 if no frame for an exon
  • Transcript ID: CAT assigned unique transcript id
  • Transcript type: type of transcript
  • Gene ID: CAT assigned unique gene id
  • Gene type: type of gene
  • Source gene ID: source GENCODE gene id
  • Source transcript ID: source GENCODE transcript id
  • Alignment ID: unique alignment id for TransMap based transcripts
  • Alternative source transcripts: ???
  • Paralogous alignment IDs: comma separated list of alignments identified as possible paralogs for this transcript
  • Unfiltered paralogous alignment IDs: paralogous alignments before filtering
  • Collapsed Gene IDs: if this gene was a part of a gene family collapse, this field reports the of identifiers of the genes collapsed together
  • Collapsed Gene Names: common names of the collapsed genes
  • Frameshifted relative to source?: is the transcript frameshifted relative to source_transcript/
  • Exon support in reference annotation: was this exon supported by the reference annotation?
  • Intron support in reference annotation: was this intron exon supported by the reference annotation?
  • Transcript class: for projection transcripts, it is maybe ortholog; for de-novo transcripts, will be one of poor_alignment, possible_paralog, putative_novel_isoform, or putative_novel
  • Transcript mode(s): Comma separated list of transcript modes
  • Valid start codon: has a valid start-codon
  • Valid stop codon: has a valid stop-codon
  • Proper multiple of 3 ORF: is there a full valid st
  • RNA intron support: RNA-Seq support introns
  • RNA exon support: RNA-Seq support exons
  • Is this transcript supported by IsoSeq?: True if the transcript is supported by Iso-Seq


Genome annotation for the calJac4 assembly was performed using Comparative Annotation Toolkit (CAT). CAT leverages whole-genome alignments generated by Cactus to transfer annotations from one source genome to one or more target genomes. CAT also runs AUGUSTUS in both the comparative gene prediction mode and in a single-genome mode that utilizes Iso-Seq data to predict alternative isoforms. CAT then combines all of these annotation methods into a final consensus annotation set that represents orthology relationships as well as species-specific information.

A Cactus alignment was produced with nine primate assemblies and mouse as an outgroup. The GENCODE v33 annotation for GRCh38 was used as a reference to map onto the calJac4 assembly. 20 IsoSeq libraries were included as evidence for the CAT annotation pipeline.

Assemblies used in Cactus alignment
Bonobo Pan paniscus Mhudiblu_PPA_v0 GCA_013052645.1
Chimp Pan troglodytes Clint_PTRv2 GCF_002880755.1
Gibbon Nomascus leucogenys Asia_NLE_v1 GCF_006542625.1
Gorilla Gorilla gorilla gorilla Kamilah_GGO_v0 GCF_008122165.1
Human Homo sapiens GRCh38 GCA_000001405
Marmoset Callithrix jacchus Callithrix_jacchus_cj1700_1.0 GCA_009663435.1
Orangutan Pongo abelii Susie_PABv2 GCF_002880775.1
Rhesus Macaca mulatta Mmul_10 GCF_003339765.1
Owl_monkey     unsubmitted
Mouse Mus musculus GRCm38 GCA_000001635.2

Data Access

For automated analysis, the genome annotation is stored in a bigGenePred format file that can be downloaded from the download server at Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example:
bigBedToBed -chrom=chr6 -start=0 -end=1000000 stdout
Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.


The alignments were generated by, Marina Haukness, Mark Diekhans, and Ian Fiddes.


Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649

Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE et al. Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res. 2018 Jul;28(7):1029-1038. PMID: 29884752; PMC: PMC6028123

Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12. PMID: 15215400; PMC: PMC441517