Schema for Intrahost SNPs - Intrahost SNP patient data from Todd Treangen's group
  Database: wuhCor1    Primary Table: treangen Data last updated: 2020-10-26
Big Bed File Download: /gbdb/wuhCor1/treangen/Treangen_iSNP.bb
Item Count: 278
The data is stored in the binary BigBed format.

Format description: Treangen track table of contents
fieldexampledescription
chromNC_045512v2Reference sequence chromosome or scaffold
chromStart19964Start position in chromosome
chromEnd19965End position in chromosome
name3.1Name or ID of item, ideally both human readable and unique
score1000Score (0-1000)
strand++ or - for strand
refPosition19965_TThe reference base and position
referenceBaseTThe base observed in the SARS-CoV-2 reference sequence
freqAltA0.0Frequency of an alternate A base observed
freqAltC1.0Frequency of an alternate C base observed
freqAltG0.0Frequency of an alternate G base observed
freqAltT0.0Frequency of an alternate T base observed
patientMutations2Number of patients observed with mutations from reference
changedAAYesWas there a change in Amino Acid from mutation
resultingAATThe resulting Amino Acid from mutation

Sample Rows
 
chromchromStartchromEndnamescorestrandrefPositionreferenceBasefreqAltAfreqAltCfreqAltGfreqAltTpatientMutationschangedAAresultingAA
NC_045512v219964199653.11000+19965_TT0.01.00.00.02YesT
NC_045512v2200042000599.71000+20005_GG1.00.00.00.02YesE
NC_045512v220135201364.81000+20136_AA0.00.330.670.03YesG
NC_045512v220321203226.51000+20322_TT0.01.00.00.02YesP
NC_045512v220406204077.51000+20407_TT0.01.00.00.02NoP
NC_045512v220450204516.31000+20451_CC0.00.00.01.02YesI
NC_045512v220456204574.01000+20457_CC0.00.00.01.02NoF
NC_045512v2205522055399.41000+20553_AA0.00.01.00.02YesSTOP
NC_045512v2207542075599.51000+20755_AA0.01.00.00.08NoD
NC_045512v220801208023.41000+20802_TT0.01.00.00.02NoT

Intrahost SNPs (treangen) Track Description
 

Description

This track shows iSNPs (intrahost SNPs). These are SNPs that have evidence for variation within one host. That is, a single patient can have variation among the various SARS-CoV-2 viruses infecting their cells. This variation is lost when a single consensus genome sequence is reported for a patient. The data were published in Sapoval et al, 2020 "Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission".

In this track, iSNPs (intrahost SNP's) of human patients from New York City and Houston are shown.

Display Conventions and Configuration

The track contains a list of iSNPs found in patient data from New York City and Houston with nucleotide and amino acid changes, one feature per variant. The name field in this track represents the median observed allele frequency for patients meeting inclusion criteria in the VCFs provided by Sapoval et al. Finally, bedToBigBed was used to create the BigBed track.

Interested users may wish to inspect each of the individual VCFs; for this track we have chosen to show a condensed version of all VCFs (see Methods).

Methods

VCF files were downloaded from the Rice University data repository. SARS-CoV-2 iSNPs from New York City and Houston patient data were parsed, and if the base position was modified in more than one sample then it was included. The frequency of observing a particular base (A,C,G,T) at the position when a change was recorded was then included, and the dominant base change was used to determine whether the base modification would also result in an amino acid change.

The original data files are available from a shared box.com folder with VCF files.

References

Sapoval N, Mahmoud M, Jochum MD, Liu Y, Leo Elworth RA, Wang Q, Albin D, Ogilvie H, Lee MD, Villapol S et al. Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission. bioRxiv. 2020 Jul 2;. PMID: 32637955; PMC: PMC7337385