The GENCODE track is composed of all the gene models in the GENCODE v32 release. By default, only the basic gene set is
displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts
that GENCODE believes will be useful to the majority of users.
The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes
are not displayed by default. It contains annotations on the reference chromosomes as well as
assembly patches and alternative loci (haplotypes).
The following table provides statistics for the v32 release derived from the GTF file that contains
annotations only on the main chromosomes. More information on how they were generated can be found
in the GENCODE site.
GENCODE v32 Release Stats
Long non-coding RNA genes
- full length protein-coding
Small non-coding RNA genes
- partial length protein-coding
Nonsense mediated decay transcripts
Immunoglobulin/T-cell receptor gene segments
Long non-coding RNA loci transcripts
For more information on the different gene tracks, see our Genes FAQ.
This track also includes a variety of labels which identify the transcripts when visibility is set
to "full" or "pack". Gene symbols (e.g. NIPA1) are displayed by default, but
additional options include GENCODE Transcript ID (ENST00000561183.5), UCSC Known Gene ID
(uc001yve.4), UniProt Display ID (Q7RTP0) and OMIM ID (608145). Additional information about gene
and transcript names can be found in our
This track, in general, follows the display conventions for gene prediction tracks. The exons for
putative non-coding genes and untranslated regions are represented by relatively thin blocks, while
those for coding open reading frames are thicker. The following color key is used:
Black -- feature has a corresponding entry in the
Protein Data Bank (PDB)
Dark blue -- transcript has been
reviewed or validated by either the RefSeq or SwissProt staff
Medium blue -- other RefSeq
Light blue -- non-RefSeq
This track contains an optional codon coloring feature that allows users to
quickly validate and compare gene predictions. There is also an option to display the data as
a density graph, which
can be helpful for visualizing the distribution of items over a region.
The GENCODE v32 track was built from the GENCODE downloads file
gencode.v32.chr_patch_hapl_scaff.annotation.gff3.gz. Data from other sources
were correlated with the GENCODE data to build the knownTo tables.
The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a
file. These include tables that link GENCODE Genes transcripts to external datasets (such as
knownToLocusLink, which maps GENCODE Genes transcripts to Entrez identifiers, previously
known as Locus Link identifiers), and tables that detail some property of GENCODE Genes transcript
sequences (such as knownToPfam, which identifies any Pfam domains found in the GENCODE Genes
One can see a full list of the associated tables in the Table Browser by selecting GENCODE Genes from the track menu; this list
is then available on the table menu. Note that some of these tables refer to GENCODE Genes
by its former name of Known Genes, sometimes abbreviated as known or kg. While
the complete set of annotation tables is too long to describe, some of the more important tables are
kgXref identifies the RefSeq, SwissProt, Rfam, or tRNA sequences (if any) which are
associated with each transcript.
knownToRefSeq identifies the RefSeq transcript that each GENCODE Genes transcript is
most closely associated with. That RefSeq transcript is the RefSeq transcript that the GENCODE
Genes transcript overlaps at the most bases.
knownGeneMrna contains the genomic sequence for each of the GENCODE Genes models.
This may not be the same as the actual mRNA used to validate the gene model.
knownGenePep contains the protein sequences derived from the knownGeneMrna transcript
sequences. Any protein-level annotations, such as the contents of the knownToPfam table, are based
on these sequences.
knownIsoforms maps each transcript to a cluster ID, a cluster of isoforms of
the same gene.
knownCanonical identifies the canonical isoform of each cluster ID or gene using the
ENSEMBL gene IDs to define each cluster. The canonical transcript is chosen using the APPRIS
principal transcript when available. If no APPRIS tag exists for any transcript associated with
the cluster, then a transcript in the BASIC set is chosen. If no BASIC transcript exists, then the
longest isoform is used.