Schema for GENCODE VM33 - GENCODE VM33
  Database: mm39    Primary Table: knownGene Data last updated: 2023-08-16
Big Bed File Download: /gbdb/mm39/gencode/gencodeVM33.bb
Item Count: 149,547
The data is stored in the binary BigBed format.

Format description: GENCODE bigGenePred
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart129201080Start position in chromosome
chromEnd130147015End position in chromosome
nameENSMUST00000073527.13Ensembl ID
score0Score (0-1000)
strand++ or - for strand
thickStart129358567Start of where display should be thick (start codon)
thickEnd130145917End of where display should be thick (stop codon)
reserved789624RGB value (use R,G,B string in input file)
blockCount28Number of blocks
blockSizes257,173,811,249,170,156,198,192,235,116,130,104,195,267,179,134,151,142,116,118,146,160,174,96,83,193,88,1180,Comma separated list of block sizes
chromStarts0,157453,322026,339766,355533,394470,404705,485102,486925,502949,543299,549248,642209,647904,678046,829727,843216,886230,889476,891705,907008,915116,916512,916787,919210,921763,936897,944755,Start positions relative to chromStart
name2uc007clv.4UCSC Genes ID
cdsStartStatnoneStatus of CDS start annotation (none, unknown, incomplete, or complete)
cdsEndStatnoneStatus of CDS end annotation (none, unknown, incomplete, or complete)
exonFrames-1,0,1,2,2,1,1,1,1,2,1,2,1,1,1,0,2,0,1,0,1,0,1,1,1,0,1,2,Exon frame {0,1,2}, or -1 if no frame for exon
typenoneTranscript type
geneNameThsd7bGene Symbol
geneName2Q6P4U0UniProt display ID
geneTypeprotein_codingGene type
transcriptClasscodingTranscript Class
sourceensembl_havana_transcript_mus_musculusSource of transcript (from gencodeTranscriptSource)
transcriptTypeprotein_codingBioType of transcript (from gencodeAttrs)
tagCCDS,Ensembl_canonical,appris_principal_1,basic,overlapping_locussymbolic tags (from gencodeTags)
level2support level, tsl1 is strongest support, tsl5 weakest, NA means not analyzed (from gencodeTranscriptionSupportLevel)
tiercanonical,basic,allTranscript Tier
rank1Transcript Rank

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStartsname2cdsStartStatcdsEndStatexonFramestypegeneNamegeneName2geneTypetranscriptClasssourcetranscriptTypetagleveltierrank
chr1129201080130147015ENSMUST00000073527.130+12935856713014591778962428257,173,811,249,170,156,198,192,235,116,130,104,195,267,179,134,151,142,116,118,146,160,174,96,83,193,88,1180,0,157453,322026,339766,355533,394470,404705,485102,486925,502949,543299,549248,642209,647904,678046,829727,843216,886230,889476, ...uc007clv.4nonenone-1,0,1,2,2,1,1,1,1,2,1,2,1,1,1,0,2,0,1,0,1,0,1,1,1,0,1,2,noneThsd7bQ6P4U0protein_codingcodingensembl_havana_transcript_mus_musculusprotein_codingCCDS,Ensembl_canonical,appris_principal_1,basic,overlapping_locus2canonical,basic,all1
chr1129843275130110918ENSMUST00000151700.80+12984327512984327578962410209,267,179,134,151,142,116,118,146,1860,0,5709,35851,187532,201021,244035,247281,249510,264813,265783,uc007clw.3nonenone-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,noneThsd7bnoneprotein_codingcodinghavana_mus_musculusprotein_coding_CDS_not_definedoverlapping_locus2all5
chr1130091567130147013ENSMUST00000140834.20+13009156713009156778962410200,118,146,160,174,96,83,193,88,1178,0,1218,16521,24629,26025,26300,28723,31276,46410,54268,uc287lmi.2nonenone-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,noneThsd7bnoneprotein_codingcodinghavana_mus_musculusprotein_coding_CDS_not_definedoverlapping_locus2all7
chr1130123716130124125ENSMUST00000190086.20+130123716130123716167249911409,0,uc287lmk.2nonenone-1,noneGm29428noneprocessed_pseudogenepseudohavana_mus_musculusprocessed_pseudogeneEnsembl_canonical,basic,overlapping_locus2canonical,basic,all1
chr1130197686130208995ENSMUST00000155992.20-130197686130197686256002604,80,0,11229,uc287lml.2nonenone-1,-1,noneGm16081nonelncRNAnonCodinghavana_mus_musculuslncRNAEnsembl_canonical,basic2canonical,basic,all1
chr1130316273130350746ENSMUST00000112488.90-13031634413035069578962410193,21,63,147,192,86,100,192,186,145,0,21335,22092,22856,25512,28302,29571,31064,33578,34328,uc007clx.4nonenone1,1,1,1,1,2,1,1,1,0,noneCd55bE9QAP4protein_codingcodingensembl_havana_transcript_mus_musculusprotein_codingCCDS,Ensembl_canonical,appris_principal_1,basic2canonical,basic,all1
chr1130316273130350746ENSMUST00000119432.20-1303163441303506957896248193,21,63,86,100,192,186,145,0,21335,22092,28302,29571,31064,33578,34328,uc011wre.4nonenone1,1,1,2,1,1,1,0,noneCd55bE9Q731protein_codingcodinghavana_mus_musculusprotein_codingCCDS,basic2basic,all2
chr1130347337130350477ENSMUST00000136497.20-1303473371303473377896243192,186,350,0,2514,2790,uc287lmm.2nonenone-1,-1,-1,noneCd55bnoneprotein_codingcodinghavana_mus_musculusprotein_coding_CDS_not_definednone2all3
chr1130350560130365651ENSMUST00000143266.20+130350560130350560256004239,68,144,1987,0,3090,12523,13104,uc287lmn.2nonenone-1,-1,-1,-1,noneCd55osnonelncRNAnonCodinghavana_mus_musculuslncRNAEnsembl_canonical,basic2canonical,basic,all1
chr1130357517130364928ENSMUST00000130486.20-1303575171303575172560032445,149,1087,0,5891,6324,uc287lmo.2nonenone-1,-1,-1,noneGm15675nonelncRNAnonCodinghavana_mus_musculuslncRNAEnsembl_canonical,basic2canonical,basic,all1

GENCODE VM33 (knownGene) Track Description
 

Description

The GENCODE Genes track (version M33, Jul 2023) shows high-quality manual annotations merged with evidence-based automated annotations across the entire human genome generated by the GENCODE project. By default, only the basic gene set is displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts that GENCODE believes will be useful to the majority of users.

The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes are not displayed by default. It contains annotations on the reference chromosomes as well as assembly patches and alternative loci (haplotypes).

The following table provides statistics for the VM33 release derived from the GTF file that contains annotations only on the main chromosomes. More information on how they were generated can be found in the GENCODE site.

GENCODE VM33 Release Stats
GenesObservedTranscriptsObserved
Protein-coding genes21,403Protein-coding transcripts58,750
Long non-coding RNA genes14,842- full length protein-coding45,112
Small non-coding RNA genes6,105- partial length protein-coding13,638
Pseudogenes13,809Nonsense mediated decay transcripts7,218
Immunoglobulin/T-cell receptor gene segments701Long non-coding RNA loci transcripts26,564
Total No of distinct translations44,993Genes that have more than one distinct translations10,893

For more information on the different gene tracks, see our Genes FAQ.

Display Conventions and Configuration

By default, this track displays only the basic GENCODE set, splice variants, and non-coding genes. It includes options to display the entire GENCODE set and pseudogenes. To customize these options, the respective boxes can be checked or unchecked at the top of this description page.

This track also includes a variety of labels which identify the transcripts when visibility is set to "full" or "pack". Gene symbols (e.g. NIPA1) are displayed by default, but additional options include GENCODE Transcript ID (ENSMUST00000052204.6), UCSC Known Gene ID (uc009hdu.3), UniProt Display ID (Q8BHK1). Additional information about gene and transcript names can be found in our FAQ.

This track, in general, follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker.

Coloring for the gene annotations is based on the annotation type:

  • coding
  • non-coding
  • pseudogene
  • problem
  • all 2-way pseudogenes
  • all polyA annotations

This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. There is also an option to display the data as a density graph, which can be helpful for visualizing the distribution of items over a region.

Methods

The GENCODE VM33 track was built from the GENCODE downloads comprehensive gene annotation (all regions) file gencode.vM33.chr_patch_hapl_scaff.annotation.gff3.gz. Data from other sources were correlated with the GENCODE data to build association tables.

Related Data

The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a downloadable file.

One can see a full list of the associated tables in the Table Browser by selecting GENCODE Genes from the track menu; this list is then available on the table menu.

Data access

GENCODE Genes and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. The genePred format files for mm39 are available from our downloads directory or in our GTF download directory. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog.

Credits

The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive gene set using a computational pipeline developed by Jim Kent and Brian Raney. This version of the track was generated by Jonathan Casper.

References

Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland JE, Mudge JM, Sisu C, Wright JC, Arnan C, Barnes I et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D942-D949. PMID: 36420896; PMC: PMC9825462

A full list of GENCODE publications is available at The GENCODE Project web site.

Data Release Policy

GENCODE data are available for use without restrictions.