Description
This track represents the gene models for the Marmoset assembly, calJac4, generated using Comparative Annotation Toolkit (CAT).
CAT can be found on GitHub: https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit
Display Conventions and Configuration
This track follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. Gene names are displayed in 'pack' or 'full' mode. More information about each gene can be found by clicking on the specific gene/transcript model.
The following color key is used:
- Blue: protein coding
- Green: non-coding
- Purple: novel predictions from augCGP/augPB
The following metadata is available by clicking on the transcript.
- Gene name: gene symbolic name
- Status of CDS start annotation: is the start of the CDS fully annotated
- Status of CDS end annotation: is the end of the CDS fully annotated
- Exon frame: comma-separate list of per-exon frames, 0,1,2, or -1 if no frame for an exon
- Transcript ID: CAT assigned unique transcript id
- Transcript type: type of transcript
- Gene ID: CAT assigned unique gene id
- Gene type: type of gene
- Source gene ID: source GENCODE gene id
- Source transcript ID: source GENCODE transcript id
- Alignment ID: unique alignment id for TransMap based transcripts
- Alternative source transcripts: ???
- Paralogous alignment IDs: comma separated list of alignments identified as possible paralogs for this transcript
- Unfiltered paralogous alignment IDs: paralogous alignments before filtering
- Collapsed Gene IDs: if this gene was a part of a gene family collapse, this field reports the of identifiers of the genes collapsed together
- Collapsed Gene Names: common names of the collapsed genes
- Frameshifted relative to source?: is the transcript frameshifted relative to source_transcript/
- Exon support in reference annotation: was this exon supported by the reference annotation?
- Intron support in reference annotation: was this intron exon supported by the reference annotation?
- Transcript class: for projection transcripts, it is maybe ortholog; for de-novo transcripts, will be one of poor_alignment, possible_paralog, putative_novel_isoform, or putative_novel
- Transcript mode(s): Comma separated list of transcript modes
- Valid start codon: has a valid start-codon
- Valid stop codon: has a valid stop-codon
- Proper multiple of 3 ORF: is there a full valid st
- RNA intron support: RNA-Seq support introns
- RNA exon support: RNA-Seq support exons
- Is this transcript supported by IsoSeq?: True if the transcript is supported by Iso-Seq
Methods
Genome annotation for the calJac4 assembly was performed using Comparative Annotation Toolkit (CAT). CAT leverages whole-genome alignments generated by Cactus to transfer annotations from one source genome to one or more target genomes. CAT also runs AUGUSTUS in both the comparative gene prediction mode and in a single-genome mode that utilizes Iso-Seq data to predict alternative isoforms. CAT then combines all of these annotation methods into a final consensus annotation set that represents orthology relationships as well as species-specific information.
A Cactus alignment was produced with nine primate assemblies and mouse as an outgroup.
The GENCODE v33 annotation for GRCh38 was used as a reference to map onto the calJac4 assembly. 20 IsoSeq libraries were included as evidence for the CAT annotation pipeline.
Assemblies used in Cactus alignment
Bonobo |
Pan paniscus |
Mhudiblu_PPA_v0 |
GCA_013052645.1 |
Chimp |
Pan troglodytes |
Clint_PTRv2 |
GCF_002880755.1 |
Gibbon |
Nomascus leucogenys |
Asia_NLE_v1 |
GCF_006542625.1 |
Gorilla |
Gorilla gorilla gorilla |
Kamilah_GGO_v0 |
GCF_008122165.1 |
Human |
Homo sapiens |
GRCh38 |
GCA_000001405 |
Marmoset |
Callithrix jacchus |
Callithrix_jacchus_cj1700_1.0 |
GCA_009663435.1 |
Orangutan |
Pongo abelii |
Susie_PABv2 |
GCF_002880775.1 |
Rhesus |
Macaca mulatta |
Mmul_10 |
GCF_003339765.1 |
Owl_monkey |
|
|
unsubmitted |
Mouse |
Mus musculus |
GRCm38 |
GCA_000001635.2 |
Data Access
For automated analysis, the genome annotation is stored in a bigGenePred
format file that
can be downloaded from the download server at
cat-consensus-v2.bb.
Annotations can be converted to ASCII text by our tool bigBedToBed
which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tool can also be used to obtain only features within a given range, for example:
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/calJac4/cat/cat-consensus-v2.bb -chrom=chr6 -start=0 -end=1000000 stdout
Please refer to our
mailing list archives
for questions, or our
Data Access FAQ
for more information.
Credits
The alignments were generated by, Marina Haukness, Mark Diekhans, and Ian Fiddes.
References
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J
et al.
Progressive Cactus is a multiple-genome aligner for the thousand-genome era.
Nature. 2020 Nov;587(7833):246-251.
PMID: 33177663; PMC: PMC7673649
Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D,
Keane T, Eichler EE et al.
Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation.
Genome Res. 2018 Jul;28(7):1029-1038.
PMID: 29884752; PMC: PMC6028123
Stanke M, Steinkamp R, Waack S, Morgenstern B.
AUGUSTUS: a web server for gene finding in eukaryotes.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12.
PMID: 15215400; PMC: PMC441517