Note: Updated Feb. 24, 2023
The GENCODE Genes track (version 43, February 2023) shows high-quality manual
annotations merged with evidence-based automated annotations across the entire
human genome generated by the
By default, only the basic gene set is
displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts
that GENCODE believes will be useful to the majority of users.
The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes
are not displayed by default. It contains annotations on the reference chromosomes as well as
assembly patches and alternative loci (haplotypes).
The following table provides statistics for the v43 release derived from the GTF file that contains
annotations only on the main chromosomes. More information on how they were generated can be found
in the GENCODE site.
|GENCODE v43 Release Stats|
|Protein-coding genes||19,393||Protein-coding transcripts||89,411|
|Long non-coding RNA genes||19,928||- full length protein-coding||64,004|
|Small non-coding RNA genes||7,566||- partial length protein-coding||25,407|
|Pseudogenes||14,737||Nonsense mediated decay transcripts||21,354|
|Immunoglobulin/T-cell receptor gene segments||410||Long non-coding RNA loci transcripts||58,023|
|Total No of distinct translations||65,519||Genes that have more than one distinct translations||13,618|
For more information on the different gene tracks, see our Genes FAQ.
Display Conventions and Configuration
By default, this track displays only the basic GENCODE set, splice variants, and non-coding genes.
It includes options to display the entire GENCODE set and pseudogenes. To customize these
options, the respective boxes can be checked or unchecked at the top of this description page.
This track also includes a variety of labels which identify the transcripts when visibility is set
to "full" or "pack". Gene symbols (e.g. NIPA1) are displayed by default, but
additional options include GENCODE Transcript ID (ENST00000561183.5), UCSC Known Gene ID
(uc001yve.4), UniProt Display ID (Q7RTP0). Additional information about gene
and transcript names can be found in our
This track, in general, follows the display conventions for gene prediction tracks. The exons for
putative non-coding genes and untranslated regions are represented by relatively thin blocks, while
those for coding open reading frames are thicker.
Coloring for the gene annotations is based on the annotation type:
- coding: protein coding transcripts, including polymorphic
- non-coding: non-protein coding transcripts
- pseudogene: pseudogene transcript annotations
- problem: problem transcripts (Biotypes of
retained_intron, TEC, or disrupted_domain)
This track contains an optional codon coloring feature that allows users to
quickly validate and compare gene predictions. There is also an option to display the data as
a density graph, which
can be helpful for visualizing the distribution of items over a region.
Within a gene using the pack display mode, transcripts below a specified rank will be
condensed into a view similar to squish mode. The transcript ranking approach is
preliminary and will change in future releases. The transcripts rankings are defined by the
following criteria for protein-coding and non-coding genes:
- MANE or Ensembl canonical
- 1st: MANE Select / Ensembl canonical
- 2nd: MANE Plus Clinical
- Coding biotypes
- 1st: protein_coding and protein_coding_LoF
- 2nd: NMDs and NSDs
- 3rd: retained intron and protein_coding_CDS_not_defined
- 1st: full length
- 2nd: CDS start/end not found
- CARS score (only for coding transcripts)
- Transcript genomic span and length (only for non-coding transcripts)
- Transcript biotype
- 1st: transcript biotype identical to gene biotype
- Ensembl canonical
- GENCODE basic
- Transcript genomic span
- Transcript length
The GENCODE v43 track was built from the GENCODE downloads file
gencode.v43.chr_patch_hapl_scaff.annotation.gff3.gz. Data from other sources
were correlated with the GENCODE data to build association tables.
The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a
One can see a full list of the associated tables in the Table Browser by selecting GENCODE Genes from the track menu; this list
is then available on the table menu.
GENCODE Genes and its associated tables can be explored interactively using the
REST API, the
Table Browser or the
The genePred format files for hg38 are available from our
downloads directory or in our
GTF download directory.
All the tables can also be queried directly from our public MySQL
servers, with more information available on our
help page as well as on
The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive gene set using a
computational pipeline developed by Jim Kent and Brian Raney.
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa
A, Searle S et al.
GENCODE: the reference human genome annotation for The ENCODE Project.
Genome Res. 2012 Sep;22(9):1760-74.
PMID: 22955987; PMC: PMC3431492
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R,
Swarbreck D et al.
GENCODE: producing a reference annotation for ENCODE.
Genome Biol. 2006;7 Suppl 1:S4.1-9.
PMID: 16925838; PMC: PMC1810553
A full list of GENCODE publications is available
at The GENCODE
Project web site.
Data Release Policy
GENCODE data are available for use without restrictions.