Frequently Asked Questions: Assembly Releases and Versions

Topics

List of UCSC genome releases
Initial assembly release dates
UCSC assemblies
Comparison of UCSC and NCBI human assemblies
Differences between UCSC and NCBI mouse assemblies
Accessing older assembly versions
Frequency of GenBank data updates
Coordinate changes between assemblies
Converting positions between assembly versions
Missing annotation tracks
What next with the human genome?
Mouse strain used for mouse genome sequence
UniProt (Swiss-Prot/TrEMBL) display changes

Return to FAQ Table of Contents

List of UCSC genome releases

How do UCSC's release numbers correspond to those of other organizations, such as NCBI?

The first release of an assembly is given a name using the first three characters of the organism's genus and species classification in the format gggSss#, with subsequent assemblies incrementing the number. Assemblies predating the 2003 introduction of the six-letter naming system were given two-letter names in a similar gs# format and human assemblies are named hg# for human genome.

SPECIES UCSC VERSION RELEASE DATE RELEASE NAME STATUS
MAMMALS
Humanhg38Dec. 2013Genome Reference Consortium GRCh38Available
hg19Feb. 2009Genome Reference Consortium GRCh37Available
hg18Mar. 2006NCBI Build 36.1Available
hg17May 2004NCBI Build 35Available
hg16Jul. 2003NCBI Build 34Available
hg15Apr. 2003NCBI Build 33Archived
hg13Nov. 2002NCBI Build 31Archived
hg12Jun. 2002NCBI Build 30Archived
hg11Apr. 2002NCBI Build 29Archived (data only)
hg10Dec. 2001NCBI Build 28Archived (data only)
hg8Aug. 2001UCSC-assembledArchived (data only)
hg7Apr. 2001UCSC-assembledArchived (data only)
hg6Dec. 2000UCSC-assembledArchived (data only)
hg5Oct. 2000UCSC-assembledArchived (data only)
hg4Sep. 2000UCSC-assembledArchived (data only)
hg3Jul. 2000UCSC-assembledArchived (data only)
hg2Jun. 2000UCSC-assembledArchived (data only)
hg1May 2000UCSC-assembledArchived (data only)
AlpacavicPac2Mar. 2013Broad Institute Vicugna_pacos-2.0.1Available
vicPac1Jul. 2008Broad Institute VicPac1.0Available
ArmadillodasNov3Dec. 2011Broad Institute DasNov3Available
BaboonpapHam1Nov. 2008Baylor College of Medicine HGSC Pham_1.0Available
papAnu2Mar. 2012Baylor College of Medicine Panu_2.0Available
BonobopanPan1May 2012Max-Planck Institute panpan1Available
Brown kiwiaptMan1Jun. 2015Max-Plank Institute for Evolutionary Anthropology AptMant0Available
BushbabyotoGar3Mar. 2011Broad Institute OtoGar3Available
CatfelCat8Nov. 2014ICGSC Felis_catus_8.0Available
felCat5Sep. 2011ICGSC Felis_catus-6.2Available
felCat4Dec. 2008NHGRI catChrV17eAvailable
felCat3Mar. 2006Broad Institute Release 3Available
ChimppanTro5May 2016CGSC Build 3.0Available
panTro4Feb. 2011CGSC Build 2.1.4Available
panTro3Oct. 2010CGSC Build 2.1.3Available
panTro2Mar. 2006CGSC Build 2.1Available
panTro1Nov. 2003CGSC Build 1.1Available
Chinese hamstercriGri1Jul. 2013Beijing Genomics Institution-Shenzhen C_griseus_v1.0Available
CowbosTau8Jun. 2014University of Maryland v3.1.1Available
bosTau7Oct. 2011Baylor College of Medicine HGSC Btau_4.6.1Available
bosTau6Nov. 2009University of Maryland v3.1Available
bosTau4Oct. 2007Baylor College of Medicine HGSC Btau_4.0Available
bosTau3Aug. 2006Baylor College of Medicine HGSC Btau_3.1Available
bosTau2Mar. 2005Baylor College of Medicine HGSC Btau_2.0Available
bosTau1Sep. 2004Baylor College of Medicine HGSC Btau_1.0Archived
Crab-eating macaquemacFas5Jun. 2013Washington University Macaca_fascicularis_5.0Available
DogcanFam3Sep. 2011Broad Institute v3.1Available
canFam2May 2005Broad Institute v2.0Available
canFam1Jul. 2004Broad Institute v1.0Available
DolphinturTru2Oct. 2011Baylor College of Medicine Ttru_1.4Available
ElephantloxAfr3Jul. 2009Broad Institute LoxAfr3Available
FerretmusFur1Apr. 2011Ferret Genome Sequencing Consortium MusPutFur1.0Available
GibbonnomLeu3Oct. 2012Gibbon Genome Sequencing Consortium Nleu3.0Available
nomLeu2Jun. 2011Gibbon Genome Sequencing Consortium Nleu1.1Available
nomLeu1Jan. 2010Gibbon Genome Sequencing Consortium Nleu1.0Available
GorillagorGor5Mar. 2016University of Washington GSMRT3Available
gorGor4Dec. 2014Wellcome Trust Sanger Institute gorGor4Available
gorGor3May 2011Wellcome Trust Sanger Institute gorGor3.1Available
Green MonkeychlSab2Mar. 2014Vervet Genomics Consortium 1.1Available
Guinea pigcavPor3Feb. 2008Broad Institute cavPor3Available
HedgehogeriEur2May 2012Broad Institute EriEur2.0Available
eriEur1Jun. 2006Broad Institute Draft_v1Available
HorseequCab2Sep. 2007Broad Institute EquCab2Available
equCab1Jan. 2007Broad Institute EquCab1Available
Kangaroo ratdipOrd1Jul. 2008Baylor/Broad Institute DipOrd1.0Available
Malayan flying lemurgalVar1Jul. 2014WashU G_variegatus-3.0.2Available
ManateetriMan1Oct. 2011Broad Institute TriManLat1.0Available
MarmosetcalJac3Mar. 2009WUSTL Callithrix_jacchus-v3.2Available
calJac1Jun. 2007WUSTL Callithrix_jacchus-v2.0.2Available
MegabatpteVam1Jul. 2008Broad Institute Ptevap1.0Available
MicrobatmyoLuc2Jul. 2010Broad Institute MyoLuc2.0Available
Minke whalebalAcu1Oct. 2013KORDI BalAcu1.0Available
Mousemm10Dec. 2011Genome Reference Consortium GRCm38Available
mm9Jul. 2007NCBI Build 37Available
mm8Feb. 2006NCBI Build 36Available
mm7Aug. 2005NCBI Build 35Available
mm6Mar. 2005NCBI Build 34Archived
mm5May 2004NCBI Build 33Archived
mm4Oct. 2003NCBI Build 32Archived
mm3Feb. 2003NCBI Build 30Archived
mm2Feb. 2002MGSCv3Archived
mm1Nov. 2001MGSCv2Archived (data only)
Mouse lemurmicMur2May 2015Baylor/Broad Institute Mmur_2.0Available
micMur1Jul. 2007Broad Institute MicMur1.0Available
Naked mole-rathetGla2Jan. 2012Broad Institute HetGla_female_1.0Available
hetGla1Jul. 2011Beijing Genomics Institute HetGla_1.0Available
OpossummonDom5Oct. 2006Broad Institute release MonDom5Available
monDom4Jan. 2006Broad Institute release MonDom4Available
monDom1Oct. 2004Broad Institute release MonDom1Available
OrangutanponAbe2Jul. 2007WUSTL Pongo_albelii-2.0.2Available
PandaailMel1Dec. 2009BGI-Shenzhen AilMel 1.0Available
PigsusScr3Aug. 2011Swine Genome Sequencing Consortium Sscrofa10.2Available
susScr2Nov. 2009Swine Genome Sequencing Consortium Sscrofa9.2Available
PikaochPri3May 2012Broad Institute OchPri3.0Available
ochPri2Jul. 2008Broad Institute OchPri2Available
PlatypusornAna2Feb. 2007WUSTL v5.0.1Available
ornAna1Mar. 2007WUSTL v5.0.1Available
RabbitoryCun2Apr. 2009Broad Institute release OryCun2Available
Ratrn6Jul. 2014RGSC Rnor_6.0Available
rn5Mar. 2012RGSC Rnor_5.0Available
rn4Nov. 2004Baylor College of Medicine HGSC v3.4Available
rn3Jun. 2003Baylor College of Medicine HGSC v3.1Available
rn2Jan. 2003Baylor College of Medicine HGSC v2.1Archived
rn1Nov. 2002Baylor College of Medicine HGSC v1.0Archived
RhesusrheMac8Nov. 2015Baylor College of Medicine HGSC Mmul_8.0.1Available
rheMac3Oct. 2010Beijing Genomics Institute CR_1.0Available
rheMac2Jan. 2006Baylor College of Medicine HGSC v1.0 Mmul_051212Available
rheMac1Jan. 2005Baylor College of Medicine HGSC Mmul_0.1Archived
Rock hyraxproCap1Jul. 2008Baylor College of Medicine HGSC Procap1.0Available
SheepoviAri3Aug. 2012ISGC Oar_v3.1Available
oviAri1Feb. 2010ISGC Ovis aries 1.0Available
ShrewsorAra2Aug. 2008Broad Institute SorAra2.0Available
sorAra1Jun. 2006Broad Institute SorAra1.0Available
SlothchoHof1Jul. 2008Broad Institute ChoHof1.0Available
SquirrelspeTri2Nov. 2011Broad Institute SpeTri2.0Available
Squirrel monkeysaiBol1Oct. 2011Broad Institute SaiBol1.0Available
TarsiertarSyr2Sep. 2013WashU Tarsius_syrichta-2.0.1Available
tarSyr1Aug. 2008WUSTL/Broad Institute Tarsyr1.0Available
Tasmanian devilsarHar1Feb. 2011Wellcome Trust Sanger Institute Devil_refv7.0Available
TenrecechTel2Nov. 2012Broad Institute EchTel2.0Available
echTel1Jul. 2005Broad Institute echTel1Available
Tree shrewtupBel1Dec. 2006Broad Institute Tupbel1.0Available
WallabymacEug2Sep. 2009Tammar Wallaby Genome Sequencing Consortium Meug_1.1Available
White rhinoceroscerSim1May 2012Broad Institute CerSimSim1.0Available
VERTEBRATES
American alligatorallMis1Aug. 2012Int. Crocodilian Genomes Working Group allMis0.2Available
Atlantic codgadMor1May 2010Genofisk GadMor_May2010Available
BudgerigarmelUnd1Sep. 2011WUSTL v6.3Available
ChickengalGal5Dev. 2015ICGC Gallus-gallus-5.0Available
galGal4Nov. 2011ICGC Gallus-gallus-4.0Available
galGal3May 2006WUSTL Gallus-gallus-2.1Available
galGal2Feb. 2004WUSTL Gallus-gallus-1.0Available
CoelacanthlatCha1Aug. 2011Broad Institute LatCha1Available
Elephant sharkcalMil1Dec. 2013IMCB Callorhinchus_milli_6.1.3Available
Fugufr3Oct. 2011JGI v5.0Available
fr2Oct. 2004JGI v4.0Available
fr1Aug. 2002JGI v3.0Available
LampreypetMar2Sep. 2010WUGSC 7.0Available
petMar1Mar. 2007WUSTL v3.0Available
LizardanoCar2May 2010Broad Institute AnoCar2Available
anoCar1Feb. 2007Broad Institute AnoCar1Available
MedakaoryLat2Oct. 2005NIG v1.0Available
Medium ground finchgeoFor1Apr. 2012BGI GeoFor_1.0 / NCBI 13302Available
Nile tilapiaoreNil2Jan. 2011Broad Institute Release OreNil1.1Available
Painted turtlechrPic1Dec. 2011IPTGSC Chrysemys_picta_bellii-3.0.1Available
SticklebackgasAcu1Feb. 2006Broad Institute Release 1.0Available
TetraodontetNig2Mar. 2007Genoscope v7Available
tetNig1Feb. 2004Genoscope v7Available
TurkeymelGal1Dec. 2009Turkey Genome Consortium v2.01Available
X. tropicalisxenTro7Sep. 2012JGI v.7.0Available
xenTro3Nov. 2009JGI v.4.2Available
xenTro2Aug. 2005JGI v.4.1Available
xenTro1Oct. 2004JGI v.3.0Available
Zebra finchtaeGut2Feb. 2013WashU taeGut324Available
taeGut1Jul. 2008WUSTL v3.2.4Available
ZebrafishdanRer10Sep. 2014Genome Reference Consortium GRCz10 Available
danRer7Jul. 2010Sanger Institute Zv9 Available
danRer6Dec. 2008Sanger Institute Zv8 Available
danRer5Jul. 2007Sanger Institute Zv7 Available
danRer4Mar. 2006Sanger Institute Zv6 Available
danRer3May 2005Sanger Institute Zv5 Available
danRer2Jun. 2004Sanger Institute Zv4 Archived
danRer1Nov. 2003Sanger Institute Zv3 Archived
DEUTEROSTOMES
C. intestinalisci2Mar. 2005JGI v2.0Available
ci1Dec. 2002JGI v1.0Available
LanceletbraFlo1Mar. 2006JGI v1.0Available
S. purpuratusstrPur2Sep. 2006Baylor College of Medicine HGSC v. Spur 2.1Available
strPur1Apr. 2005Baylor College of Medicine HGSC v. Spur_0.5Available
INSECTS
A. melliferaapiMel2Jan. 2005Baylor College of Medicine HGSC v.Amel_2.0 Available
apiMel1Jul. 2004Baylor College of Medicine HGSC v.Amel_1.2 Available
A. gambiaeanoGam1Feb. 2003IAGP v.MOZ2Available
D. ananassaedroAna2Aug. 2005Agencourt Arachne releaseAvailable
droAna1Jul. 2004TIGR Celera releaseAvailable
D. erectadroEre1Aug. 2005Agencourt Arachne releaseAvailable
D. grimshawidroGri1Aug. 2005Agencourt Arachne releaseAvailable
D. melanogasterdm6Aug. 2014BDGP Release 6 + ISO1 MTAvailable
dm3Apr. 2006BDGP Release 5Available
dm2Apr. 2004BDGP Release 4Available
dm1Jan. 2003BDGP Release 3Available
D. mojavensisdroMoj2Aug. 2005Agencourt Arachne releaseAvailable
droMoj1Aug. 2004Agencourt Arachne releaseAvailable
D. persimilisdroPer1Oct. 2005Broad Institute releaseAvailable
D. pseudoobscuradp3Nov. 2004Flybase Release 1.0Available
dp2Aug. 2003Baylor College of Medicine HGSC Freeze 1Available
D. sechelliadroSec1Oct. 2005Broad Institute Release 1.0Available
D. simulansdroSim1Apr. 2005WUSTL Release 1.0Available
D. virilisdroVir2Aug. 2005Agencourt Arachne releaseAvailable
droVir1Jul. 2004Agencourt Arachne releaseAvailable
D. yakubadroYak2Nov. 2005WUSTL Release 2.0Available
droYak1Apr. 2004WUSTL Release 1.0Available
NEMATODES
C. brennericaePb2Feb. 2008WUSTL 6.0.1Available
caePb1Jan. 2007WUSTL 4.0Available
C. briggsaecb3Jan. 2007WUSTL Cb3Available
cb1Jul. 2002WormBase v. cb25.agp8Available
C. elegansce11Feb. 2013C. elegans Sequencing Consortium WBcel235Available
ce10Oct. 2010WormBase v. WS220Available
ce6May 2008WormBase v. WS190Available
ce4Jan. 2007WormBase v. WS170Available
ce2Mar. 2004WormBase v. WS120Available
ce1May 2003WormBase v. WS100Archived
C. japonicacaeJap1Mar. 2008WUSTL 3.0.2Available
C. remaneicaeRem3May 2007WUSTL 15.0.1Available
caeRem2Mar. 2006WUSTL 1.0Available
P. pacificuspriPac1Feb. 2007WUSTL 5.0Available
OTHER
Sea HareaplCal1Sep. 2008Broad Release Aplcal2.0Available
YeastsacCer3April 2011SGD April 2011 sequenceAvailable
sacCer2June 2008SGD June 2008 sequenceAvailable
sacCer1Oct. 2003SGD 1 Oct 2003 sequenceAvailable
VIRUSES
Ebola ViruseboVir3June 2014Sierra Leone 2014 (G3683/KM034562.1)Available

Initial assembly release dates

When will the next assembly be out?

UCSC does not produce its own genome assemblies, but instead obtains them from standard sources. For example, the human assembly is obtained from NCBI. Because of this, you can expect us to release a new version of a genome soon after the assembling organization has released the version. A new assembly release initially consists of the genome sequence and a small set of aligned annotation tracks. Additional annotation tracks are added as they are obtained or generated. Bulk downloads of the data are typically available in the first week after the assembly is released in the browser.

Data sources - UCSC assemblies

Where does UCSC obtain the assembly and annotation data displayed in the Genome Browser?

All the assembly data displayed in the UCSC Genome Browser are obtained from external sequencing centers. To determine the data source and version for a given assembly, see the assembly's description on the Genome Browser Gateway page or the List of UCSC Genome Releases.

The annotations accompanying an assembly are obtained from a variety of sources. The UCSC Genome Bioinformatics Group generates several of the tracks; the remainder are contributed by collaborators at other sites. Each track has an associated description page that credits the authors of the annotation.

For detailed information about the individuals and organizations who contributed to a specific assembly, see the Credits page.

Comparison of UCSC and NCBI human assemblies

How do the human assemblies displayed in the UCSC Genome Browser differ from the NCBI human assemblies?

Recent human assemblies displayed in the Genome Browser (hg10 and higher) are identical to the NCBI assemblies.

Differences between UCSC and NCBI mouse assemblies

Is the mouse genome assembly displayed in the UCSC Genome Browser the same as the one on the NCBI website?

The mouse genome assemblies featured in the UCSC Genome Browser are the same as those on the NCBI web site with one difference: the UCSC versions contain only the reference strain data (C57BL/6J). NCBI provides data for several additional strains in their builds.

Accessing older assembly versions

I need to access an older version of a genome assembly that's no longer listed in the Genome Browser menu. What should I do?

In addition to the assembly versions currently available in the Genome Browser, you can access the data for older assemblies of the browser through our Downloads page.

Frequency of GenBank data updates

How frequently does UCSC update its databases with new data from GenBank?

Daily and weekly incremental updates of mRNA, RefSeq, and EST data are in place for several of the more recent Genome Browser assemblies. Assemblies that are not on an incremental update schedule are updated whenever we load a new assembly or make a major revision to a table.

Data are updated on the following schedule:

Mirror sites are not required to use an incremental update process, and should not experience problems as a result of these updates./p>

Coordinate changes between assemblies

I noticed that the chromosomal coordinates for a particular gene that I'm looking at have changed since the last time I used your browser. What happened?

A common source of confusion for users arises from mixing up different assemblies. It is very important to be aware of which assembly you are looking at. Within the Genome Browser display, assemblies are labeled by organism and date. To look up the corresponding UCSC database name or NCBI build number, use the release table.

UCSC database labels are of the form hg#, panTro#, etc. The letters designate the organism, e.g. hg for human genome or panTro for Pan troglodytes. The number denotes the UCSC assembly version for that organism. For example, ce1 refers to the first UCSC assembly of the C. elegans genome.

The coordinates of your favorite gene in one assembly may not be the same as those in the next release of the assembly unless the gene happens to lie on a completely sequenced and unrevised chromosome. For information on integrating data from one assembly into another, see the Converting positions between assembly versions section.

Converting positions between assembly versions

I've been researching a specific area of the human genome on the current assembly, and now you've just released a new version. Is there an easy way to locate my area of interest on the new assembly?

See the section on converting coordinates for information on assembly migration tools.

Missing annotation tracks

Why is my favorite annotation track missing from your latest release?

The initial release of a new genome assembly typically contains a small subset of core annotation tracks. New tracks are added as they are generated. In many cases, our annotation tracks are contributed by scientists not affiliated with UCSC who must first obtain the sequence, repeatmasked data, etc. before they can produce their tracks. If you have need of an annotation that has not appeared on an assembly within a month or so of its release, feel free to send an inquiry to genome@soe.ucsc.edu. Messages sent to this address will be posted to the moderated genome mailing list, which is archived on a SEARCHABLE, PUBLIC Google Groups forum.

What next with the human genome?

Now that the human genome is "finished", will there be any more releases?

Rest assured that work will continue. There will be updates to the assembly over the next several years. This has been the case for all other finished (i.e. essentially complete) genome assemblies as gaps are closed. For example, the C. elegans genome has been "finished" for several years, but small bits of sequence are still being added and corrections are being made. NCBI will continue to coordinate the human genome assemblies in collaboration with the individual chromosome coordinators, and UCSC will continue to QC the assembly in conjunction with NCBI (and, to a lesser extent, Ensembl). UCSC, NCBI, Ensembl, and others will display the new releases on their sites as they become available.

Mouse strain used for mouse genome sequence

What strain of mouse was used for the Mus musculus genome?

C57BL/6J.

UniProt (Swiss-Prot/TrEMBL) display changes

What has UCSC done to accommodate the changes to display IDs recently introduced by UniProt (aka Swiss-Prot/TrEMBL)?

Here is a detailed description of the database changes we have made to accommodate the UniProt changes. If you are using the proteinID field in our knownGene table or the Swiss-Prot/TrEMBL display ID for indexing or cross-referencing other data, we strongly suggest you transition to the UniProt accession number. These changes will also affect anyone who is mirroring our site.

  1. The latest UniProt Knowledgebase (Release 46.0, Feb. 1st, 2005) was parsed and the results were stored in a newly created database sp050201.
  2. A corresponding database, proteins050201, was constructed based on data in sp050201 and other protein data sources.
  3. Two new symbolic database pointers, uniProt and proteome, have been created to point to the two new databases mentioned above. Some parts of our programs use the data in these two DBs.
    uniProt  ---> sp050201
    proteome ---> proteins050201
  4. The existing protein symbolic database pointers, swissProt and proteins remain unchanged. Some parts of our programs still use these two pointers and the data in their associated protein databases.
    swissProt ---> sp041115
    proteins  ---> proteins041115
  5. Two new tables, spOldNew and uniProtAlias, have been added to the proteome database.

    The spOldNew table contains three columns:
    • acc -- primary accession number
    • oldDisplayId -- old display ID
    • newDisplayId -- new display ID

    The uniProtAlias table contains four columns:
    • acc -- UniProt accession number
    • alias -- alias (could be acc, old and new display IDs, etc.)
    • aliasSrc -- source of the alias type
    • aliasSrcDate -- date of the source data

    The aliases include primary accessions, secondary accessions new display IDs, old display IDs, and old display IDs corresponding to new secondary accessions.

  6. Three new functions have been added to kent/src/hg/spDb.c:
    char *oldSpDisplayId(char *newSpDisplayId);
    /* Convert from new Swiss-Prot display ID to old display ID */
          
    char *newSpDisplayId(char *oldSpDisplayId); 
    /* Convert from old Swiss-Prot display ID to new display ID */
    
    char *uniProtFindPrimAcc(char *id);
    /* Return primary accession given an alias. */
    The uniProtFindPrimAcc() function is enabled by the new uniProtAlias table.

We anticipate additional changes down the road and may eventually merge the two sets of protein DB pointers into one set.

Currently, the proteinID field of the knownGene table for existing genome releases (hg15, hg16, hg17, mm3, mm4, mm5, rn2, and rn3) uses old Swiss-Prot/TrEMBL display IDs (pre-1 Feb. '05). In the future, we may change this field to show the UniProt accession number. Should we choose not to change the content of the proteinID field, we may consider adding a new field, uniProtAcc.

If you have any questions about these changes and their impact on your work, please email us at genome@soe.ucsc.edu. Mirror sites may send questions to genome-mirror@soe.ucsc.edu. Messages sent to these addresses will be posted to the moderated mailing lists, which are archived on a SEARCHABLE, PUBLIC Google Groups forum.