Schema for CAT Genes - CAT Gene Annotations
  Database: calJac4    Primary Table: catV2 Data last updated: 2020-06-25
Big Bed File Download: /gbdb/calJac4/catV2/
Item Count: 254,021
The data is stored in the binary BigBed format.

Format description: bigCat gene models
chromSuper-Scaffold_100045Reference sequence chromosome or scaffold
chromStart31912Start position in chromosome
chromEnd32465End position in chromosome
score0Score (0-1000)
strand++ or - for strand
thickStart31912Start of where display should be thick (start codon)
thickEnd32465End of where display should be thick (stop codon)
reserved135,76,212RGB value (use R,G,B string in input file)
blockCount2Number of blocks
blockSizes29,283Comma separated list of block sizes
chromStarts0,270Start positions relative to chromStart
name2N/AGene name
cdsStartStatcmplStatus of CDS start annotation
cdsEndStatcmplStatus of CDS end annotation
exonFrames0,2Exon frame {0,1,2}, or -1 if no frame for exon
txIdMarmoset_T0000001Transcript ID
typeunknown_likely_codingTranscript type
geneNameMarmoset_G0000001Gene ID
geneTypeunknown_likely_codingGene type
sourceGeneN/ASource gene ID
sourceTranscriptN/ASource transcript ID
alignmentIdaugCGP-38.t1Alignment ID
alternativeSourceTranscriptsN/AAlternative source transcripts
ParalogyN/AParalogous alignment IDs
UnfilteredParalogyN/AUnfiltered paralogous alignment IDs
collapsedGeneIdsN/ACollapsed Gene IDs
collapsedGeneNamesN/ACollapsed Gene Names
frameshiftN/AFrameshifted relative to source?
exonAnnotationSupport0,0Exon support in reference annotation
intronAnnotationSupport0Intron support in reference annotation
transcriptClasspossible_paralogTranscript class
transcriptModesaugCGPTranscript mode(s)
validStartN/AValid start codon
validStopN/AValid stop codon
properOrfN/AProper multiple of 3 ORF
intronRnaSupport0RNA intron support
exonRnaSupport0,0RNA exon support
pbIsoformSupportedFalseIs this transcript supported by IsoSeq?

Sample Rows

CAT Genes (catV2) Track Description


This track represents the gene models for the Marmoset assembly, calJac4, generated using Comparative Annotation Toolkit (CAT).

CAT can be found on GitHub:

Display Conventions and Configuration

This track follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. Gene names are displayed in 'pack' or 'full' mode. More information about each gene can be found by clicking on the specific gene/transcript model.

The following color key is used:

  • Blue: protein coding
  • Green: non-coding
  • Purple: novel predictions from augCGP/augPB

The following metadata is available by clicking on the transcript.

  • Gene name: gene symbolic name
  • Status of CDS start annotation: is the start of the CDS fully annotated
  • Status of CDS end annotation: is the end of the CDS fully annotated
  • Exon frame: comma-separate list of per-exon frames, 0,1,2, or -1 if no frame for an exon
  • Transcript ID: CAT assigned unique transcript id
  • Transcript type: type of transcript
  • Gene ID: CAT assigned unique gene id
  • Gene type: type of gene
  • Source gene ID: source GENCODE gene id
  • Source transcript ID: source GENCODE transcript id
  • Alignment ID: unique alignment id for TransMap based transcripts
  • Alternative source transcripts: ???
  • Paralogous alignment IDs: comma separated list of alignments identified as possible paralogs for this transcript
  • Unfiltered paralogous alignment IDs: paralogous alignments before filtering
  • Collapsed Gene IDs: if this gene was a part of a gene family collapse, this field reports the of identifiers of the genes collapsed together
  • Collapsed Gene Names: common names of the collapsed genes
  • Frameshifted relative to source?: is the transcript frameshifted relative to source_transcript/
  • Exon support in reference annotation: was this exon supported by the reference annotation?
  • Intron support in reference annotation: was this intron exon supported by the reference annotation?
  • Transcript class: for projection transcripts, it is maybe ortholog; for de-novo transcripts, will be one of poor_alignment, possible_paralog, putative_novel_isoform, or putative_novel
  • Transcript mode(s): Comma separated list of transcript modes
  • Valid start codon: has a valid start-codon
  • Valid stop codon: has a valid stop-codon
  • Proper multiple of 3 ORF: is there a full valid st
  • RNA intron support: RNA-Seq support introns
  • RNA exon support: RNA-Seq support exons
  • Is this transcript supported by IsoSeq?: True if the transcript is supported by Iso-Seq


Genome annotation for the calJac4 assembly was performed using Comparative Annotation Toolkit (CAT). CAT leverages whole-genome alignments generated by Cactus to transfer annotations from one source genome to one or more target genomes. CAT also runs AUGUSTUS in both the comparative gene prediction mode and in a single-genome mode that utilizes Iso-Seq data to predict alternative isoforms. CAT then combines all of these annotation methods into a final consensus annotation set that represents orthology relationships as well as species-specific information.

A Cactus alignment was produced with nine primate assemblies and mouse as an outgroup. The GENCODE v33 annotation for GRCh38 was used as a reference to map onto the calJac4 assembly. 20 IsoSeq libraries were included as evidence for the CAT annotation pipeline.

Assemblies used in Cactus alignment
Bonobo Pan paniscus Mhudiblu_PPA_v0 GCA_013052645.1
Chimp Pan troglodytes Clint_PTRv2 GCF_002880755.1
Gibbon Nomascus leucogenys Asia_NLE_v1 GCF_006542625.1
Gorilla Gorilla gorilla gorilla Kamilah_GGO_v0 GCF_008122165.1
Human Homo sapiens GRCh38 GCA_000001405
Marmoset Callithrix jacchus Callithrix_jacchus_cj1700_1.0 GCA_009663435.1
Orangutan Pongo abelii Susie_PABv2 GCF_002880775.1
Rhesus Macaca mulatta Mmul_10 GCF_003339765.1
Owl_monkey     unsubmitted
Mouse Mus musculus GRCm38 GCA_000001635.2

Data Access

For automated analysis, the genome annotation is stored in a bigGenePred format file that can be downloaded from the download server at Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example:
bigBedToBed -chrom=chr6 -start=0 -end=1000000 stdout
Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.


The alignments were generated by, Marina Haukness, Mark Diekhans, and Ian Fiddes.


Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649

Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE et al. Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res. 2018 Jul;28(7):1029-1038. PMID: 29884752; PMC: PMC6028123

Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12. PMID: 15215400; PMC: PMC441517