Omicron VOC (B.1.1.529 SA Nov-2021) nucleotide mutations identifed from GISAID sequences (Nov 2021) (C25584T)
 
B.1.1.529 Situation Report at outbreak.info
Mutation: C25584T
Position: NC_045512v2:25584-25584
Genomic Size: 1
View DNA for this feature (wuhCor1/SARS-CoV-2)
Data schema/format description and download

Go to Variants of Concern track controls

Source data version: Sequences downloaded September 10, 2021, update on Dec 2, 2021 and May 4, 2022
Data last updated at UCSC: 2021-12-03 00:27:21

Description

This track displays amino acid and nucleotide mutations in SARS-CoV-2 variants as defined in December 2021 by the World Health Organization (WHO). Note that the Center for Disesase Control (CDC) classification of SARS-CoV-2 variants is slightly different than the WHO.

Mutations in this track were identified from viral sequences from GISAID. Variant incidence and geographic distribution information is available from links to the Outbreak.info web resource on the mutation details pages.

  • Variants of Concern (VOC) have evidence for increased transmissibility, virulence, and/or decreased diagnostic, therapeutic, or vaccine efficacy.
  • Variants of Interest (VOI) contain mutations suspected or confirmed to cause a change in transmissibility, virulence, or diagnostic / therapeutic / vaccine efficacy, plus evidence of significant community transmission, a cluster of cases, or detection in multiple countries.
  • Variants under Monitoring (VUM) include variants with unclear epidemiological impact. This track includes only the four VUMs which were previously identified as Variants of Interest, now reclassified at this lower level of concern.

The related track B.1.1.7 in USA displays a phylogenetic tree of the first B.1.1.7 (Alpha) variant sequences collected in the United States.

BV-BRC has a similar list of variants of concern and their mutations, but has added representative sequences.

Display Conventions

Track colors are based on Nextstrain.org clade coloring:

The Greek-letter names assigned by the World Health Organization (WHO) are listed in this table, along with lineage and clade designations:

ColorWHO label Pangolin lineageNextstrain cladeGISAID clade First detectedDate VOC/VOIType
      Alpha B.1.1.7 20I (V1) GRY Sep 2020, United Kingdom 18-Dec-2020 VOC
      Beta B.1.351 20H (V2) GH/501Y.V2 May 2020, South Africa 18-Dec-2020 VOC
      Gamma P.1 20J (V3) GR/501Y.V3 Nov 2020, Brazil 11-Jan-2021 VOC
      Delta B.1.617.2 21A GK/478K.V1 Oct 2020, India 11-May-2021 VOC
      Omicron B.1.1.529 21K GR/484A Nov 2020, South Africa 26-Nov-2021 VOC
      Lambda C.37 21G GR/452Q.V1 Dec 2020, Peru 14-Jun-2021 VOI
      Mu B.1.621 21H GH Jan 2021, Colombia 30-Aug-2021 VOI
      Epsilon (former) B.1.429 21C GH/452R.V1 Mar 2020, USA 06-Jul-2021 VUM
      Eta (former) B.1.525 21D G/484K.V3 Dec 2020 20-Sep-2021 VUM
      Iota (former) B.1.526 21F GH/253G.V1 Nov 2020, USA 20-Sep-2021 VUM
      Kappa (former) B.1.617.1 21B G/452R.V3 Oct 2020, India 20-Sep-2021 VUM

Mutations in the amino acid track are named with the format:

        [Reference amino acid][1-based coordinate in peptide][Alternate amino acid]. E.g., L452R

Mutations in the nucleotide track are named with the format:

        [Reference nucleotide][1-based coordinate in genome][Alternate nucleotide]. E.g., T22918G
Insertions and deletions in both tracks are named:
        [del/ins]_[1-based genomic coordinate of first affected nucleotide].  E.g., del_21991

Methods

For each virus variant, SARS-CoV-2 genome sequences containing all characteristic mutations of the lineage were downloaded from GISAID using the lineage search feature (restricting to complete, high-coverage genomes, and restricting to earliest sample collection dates when there were too many results for the download limit of 10,000 sequences per query).

Sequences were aligned to the SARS-CoV-2 reference genome using the global_profile_alignment.sh script from the sarscov2phylo repository. Single-nucleotide substitutions were extracted from the alignment using the UCSC tool faToVcf (available on the UCSC download server or from bioconda; also requires the SARS-CoV-2 reference sequence). Single-nucleotide substitutions present at a frequency of at least 0.95 (.70 for Delta, .80 for Omicron) were retained while all others are discarded.

For indel detection, the Minimap2 suite of tools was used as follows:

        minimap2 --cs [Reference Sequence] [Set of Unaligned Sequences] | paftools.js call -L 10000 -

Indels present at a frequency of at least 0.85 (.50 for Delta, .70 for Omicron) were retained. Less stringent cutoffs were applied to Delta and Omicron variant sequences due to low quality of early sequences.

The results were then combined and formatted by lineageVariants.py. The entire pipeline was run using lineageVariants.sh.

Data Access

You can download the bigBed data files for this track from the UCSC Download Server. The data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API. For complete genome Fasta sequences of variants of concern, please visit the following third-party page:

Release Notes

Version 2 of this track adds one new Variant of Concern (Delta), two new Variants of Interest (Lambda, Mu), and three named variants previously VOI, now designated as less concerning Variants under Monitoring (Eta, Iota, Kappa). The track labels of all variants have been updated to include WHO labels. Track colors reflect Nextstrain conventions at the time of track data update (September 10, 2021).

Omicron was added December 2, 2021.

Credits

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. We thank Rob Lanfear at the Australia National University for developing and maintaining the sarscov2phylo web resource. We also thank the Su, Wu, and Andersen labs at Scripps Research for creating the Outbreak.info resource. The lineageVariants scripts were developed and run at UCSC by Nick Keener, Kate Rosenbloom and Angie Hinrichs.

References

Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020 Nov;5(11):1403-1407. PMID: 32669681

Rambaut A, Loman N, Pybus O, Barclay W, Barrett J, Carabelli A, Connor T, Peacock T, Robertson DL, Volz E, et al. Preliminary genomic characterization of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. Virological. 2020 Dec 18.

Volz E, Mishra S, Chand M, Barrett JC, Johnson E, Geidelberg L, Hinsley WR, Laydon DJ, Dabrera G, O'Toole Á, et al. Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. Virological. 2020 Dec 31.

Tegally et al, December 21, 2020. Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa medRxiv preprint. Zhang al, January 20, 2021. Emergence of a novel SARS-CoV-2 strain in Southern California, USA medRxiv preprint.

Voloch et al, December 26, 2020. Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil medRxiv preprint.

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

Li Heng Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. PMID: 29750242; PMC: PMC6137996

Gangavarapu, Karthik; Alkuzweny, Manar; Cano, Marco; Haag, Emily; Latif, Alaa Abdel; Mullen, Julia L.; Rush, Benjamin; Tsueng, Ginger; Zhou, Jerry; Andersen, Kristian G.; Wu, Chunlei; Su, Andrew I.; Hughes, Laura D. Outbreak.info