Schema for EVS Variants - NHLBI GO Exome Sequencing Project (ESP) - Variants from 6,503 Exomes
Database: hg19 Primary Table: evsEsp6500|
VCF File: /gbdb/hg19/evs/ESP6500SI-V2-SSA137.updatedProteinHgvs.chr1.snps_indels.vcf.gz
Format description: The fields of a Variant Call Format data line
See the Variant Call Format specification for more details
|chrom||An identifier from the reference genome|
|pos||The reference position, with the 1st base having position 1|
|id||Semi-colon separated list of unique identifiers where available|
|alt||Comma separated list of alternate non-reference alleles called on at least one of the samples|
|qual||Phred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong)|
|filter||PASS if this position has passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail|
|info||Additional information encoded as a semicolon-separated series of short keys with optional comma-separated values|
|format||If genotype columns are specified in header, a semicolon-separated list of of short keys starting with GT|
|genotypes||If genotype columns are specified in header, a tab-separated set of genotype column values; each value is a colon-separated list of values corresponding to keys in the format column|
EVS Variants (evsEsp6500) Track Description
The goal of the
NHLBI GO Exome Sequencing Project (ESP)
is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by
pioneering the application of next-generation sequencing of the protein coding regions of the
human genome across diverse, richly-phenotyped populations and to share these datasets and
findings with the scientific community to extend and enrich the diagnosis, management and
treatment of heart, lung and blood disorders. The current data release (ESP6500SI-V2-SSA137)
is taken from 6,503 samples drawn from multiple
and represents all of the ESP exome variant data.
Data in this track were obtained from the
EVS Release Version: v.0.0.25. (Feb. 7, 2014).
In "dense" mode, a vertical line is drawn at the position of each
In "pack" and "full" modes, in addition to the vertical line, a label to
the left shows the reference allele first and variant alleles below
(A = red, C = blue,
G = green, T = magenta,
Indels = black).
Hovering the pointer over any variant will prompt the display of the occurrences numbers for each
allele in the Exome Sequencing Project's database. Clicking on any variant will result in
full details of that variant being displayed as well as possible links to the ESP and dbSNP
Sequences were aligned to NCBI build 37 human genome reference using BWA. PCR duplicates
were removed using Picard. Alignments were recalibrated using GATK. Lane-level indel realignments
and base alignment quality (BAQ) adjustments were applied.
All data were simultaneously analyzed for exome SNP variants at the University of Michigan
(by the Abecasis Laboratory). SNPs were called using a two-step approach. First, genotype
likelihood files (GLFs) were generated using samtools pileup on individual BAM files. Next,
we used glfMultiples, a multi-sample variant caller, to generate initial SNP calls. Details of
the likelihood model implemented in glfMultiples are given in Li, et al., 2011
(in the section entitled "Identifying Potential Polymorphic Sites"). The Michigan SNP calling pipeline
is available at:
This pipeline makes diploid calls for pseudo-autosomal regions of male samples and haploid
calls for the rest of the chromosome. Female samples have diploid calls for all regions on
the X chromosome. SNPs were filtered by a machine-learning technique called support
vector machine (SVM) classification (for a detailed description, see
Small INDEL variants were analyzed at the Broad Institute (by the Genome Sequencing and
Analysis group) using the
variation discovery pipeline following the guidelines in the
GATK best practices v4.
More specifically, each BAM was reduced to create a Reduced BAM, and then INDELs were
discovered by analyzing all samples simultaneously with the GATK
and subsequently filtered by the GATK Variant Quality Score Recalibration (VQSR) filtering
model, again following the V4 best practices. The INDEL genotypes for X and Y chromosomes
were adjusted to be consistent with the samples' genders. Female samples have diploid calls
for all regions on the X chromosome. Male samples have diploid calls for pseudo-autosomal
regions on the X chromosome and haploid calls for the rest of the X chromosome and on the
Y chromosome as well. However, the INDEL calls for the ESP data are preliminary and not as
robust as the SNP calls at this point. Users are advised to keep this difference in mind
when applying the ESP data to research studies.
All SNPs and INDELs were further annotated by
and the variant annotations at the coding-DNA and protein levels mostly follow
The SNP calls are included in the release of dbSNP build 138. The full dataset is described in
Fu, et al., 2013, and a subset of the data (i.e., 2,500 exomes) was published by the ESP Population
Genetics and Statistical Analysis Working Group in Tennessen, et al., 2012.
The authors would like to thank the
NHLBI GO Exome Sequencing Project
and its ongoing studies which produced and provided exome variant calls for comparison: the
Lung GO Sequencing Project (HL-102923),
WHI Sequencing Project (HL-102924),
Broad GO Sequencing Project (HL-102925),
Seattle GO Sequencing Project (HL-102926),
Heart GO Sequencing Project (HL-103010).
Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Rieder MJ, Altshuler D, Shendure
J et al.
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.
Nature. 2013 Jan 10;493(7431):216-20.
PMID: 23201682; PMC: PMC3676746
Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR.
Low-coverage sequencing: implications for design of complex trait association studies.
Genome Res. 2011 Jun;21(6):940-51.
PMID: 21460063; PMC: PMC3106327
Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G et
Evolution and functional impact of rare coding variation from deep sequencing of human exomes.
Science. 2012 Jul 6;337(6090):64-9.
PMID: 22604720; PMC: PMC3708544