Schema for gnomAD v3 - Genome Aggregation Database (gnomAD) Genome Variants v3
  Database: hg38    Primary Table: gnomadGenomesVariantsV3
VCF File: /gbdb/hg38/gnomAD/vcf/gnomad.genomes.r3.0.sites.vcf.gz
Format description: The fields of a Variant Call Format data line
See the Variant Call Format specification for more details
fielddescription
chromAn identifier from the reference genome
posThe reference position, with the 1st base having position 1
idSemi-colon separated list of unique identifiers where available
refReference base(s)
altComma separated list of alternate non-reference alleles called on at least one of the samples
qualPhred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong)
filterPASS if this position has passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail
infoAdditional information encoded as a semicolon-separated series of short keys with optional comma-separated values
formatIf genotype columns are specified in header, a semicolon-separated list of of short keys starting with GT
genotypesIf genotype columns are specified in header, a tab-separated set of genotype column values; each value is a colon-separated list of values corresponding to keys in the format column

Sample Rows
 
chromposidrefaltqualfilterinfo
chr110031.TC77.00AC0;AS_VQSRAC=0;AN=53780;AF=0.00000e+00;lcr;variant_type=snv;n_alt_alleles=1;ReadPosRankSum=-1.38000e+00;MQRankSum=-5.72000e-01;RAW_MQ=6.39 ...
chr110037.TC180.00AS_VQSRAC=2;AN=72762;AF=2.74869e-05;lcr;variant_type=snv;n_alt_alleles=1;ReadPosRankSum=-4.80000e-01;MQRankSum=1.37100e+00;RAW_MQ=2.025 ...
chr110043.TC97.00AS_VQSRAC=1;AN=81114;AF=1.23283e-05;lcr;variant_type=snv;n_alt_alleles=1;ReadPosRankSum=-8.96000e-01;MQRankSum=1.23100e+00;RAW_MQ=8.174 ...
chr110055.TC75.00AS_VQSRAC=1;AN=89638;AF=1.11560e-05;lcr;variant_type=snv;n_alt_alleles=1;ReadPosRankSum=-1.10600e+00;MQRankSum=7.15000e-01;RAW_MQ=1.141 ...
chr110057.AC264.00AS_VQSRAC=3;AN=107374;AF=2.79397e-05;lcr;variant_type=snv;n_alt_alleles=1;ReadPosRankSum=-6.84000e-01;MQRankSum=7.88000e-01;RAW_MQ=2.27 ...
chr110061.TC72.00AC0AC=0;AN=103816;AF=0.00000e+00;lcr;variant_type=snv;n_alt_alleles=1;ReadPosRankSum=0.00000e+00;MQRankSum=1.05000e-01;RAW_MQ=1.410 ...
chr110061.TTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC1142.00AC0;AS_VQSRAC=0;AN=103816;AF=0.00000e+00;lcr;variant_type=indel;n_alt_alleles=1;ReadPosRankSum=-1.02600e+00;MQRankSum=7.36000e-01;RAW_MQ=8. ...
chr110064.CCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAA71.00AC0AC=0;AN=140930;AF=0.00000e+00;lcr;variant_type=indel;n_alt_alleles=1;ReadPosRankSum=-7.27000e-01;MQRankSum=7.27000e-01;RAW_MQ=1. ...
chr110067rs1489251879TTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC952.00PASSAC=2;AN=114200;AF=1.75131e-05;lcr;variant_type=indel;n_alt_alleles=1;ReadPosRankSum=0.00000e+00;MQRankSum=6.76000e-01;RAW_MQ=9.6 ...
chr110108.CCA89.00AS_VQSRAC=1;AN=9128;AF=1.09553e-04;lcr;variant_type=indel;n_alt_alleles=1;ReadPosRankSum=2.10000e+00;MQRankSum=-1.55200e+00;RAW_MQ=7.87 ...

gnomAD v3 (gnomadGenomesVariantsV3) Track Description
 

Description

The gnomAD v3 track shows variants and derived information from 71,702 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. Most of the genomes from v2 are included in v3. For more detailed information on gnomAD v3, see the related blog post.

The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate from 141,456 unrelated individuals sequenced as part of various population-genetic and disease-specific studies collected by the Genome Aggregation Database (gnomAD), release 2.1.1. Raw data from all studies have been reprocessed through a unified pipeline and jointly variant-called to increase consistency across projects. For more information on the processing pipeline and population annotations, see the following blog post and the 2.1.1 README.

gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the GRCh38/hg38 lift-over provided by gnomAD on their downloads site.

On hg38 only, a subtrack "Gnomad mutational constraint" captures the depletion of disruptive variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions, too. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see preprint in the Reference section for details). The chrX scores were added as received from the authors, as there are no mutations available for chrX, they are more speculative than the ones on the autosomes.

For questions on the gnomAD data, also see the gnomAD FAQ.

Display Conventions

  • In mode, a vertical line is drawn at the position of each variant.
  • In mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts.
  • Data Access

    The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that can be downloaded from our download server, subject to the conditions set forth by the gnomAD consortium (see below). Coverage values and constraint scores for the genome are in bigWig files in the coverage/ subdirectory. Variant VCFs can be found in the vcf/ subdirectory.

    The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

    Credits

    Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the ODC Open Database License (OBdL) as described here.

    References

    Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. doi: https://doi.org/10.1101/531210.

    Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 17;536(7616):285-91. PMID: 27535533; PMC: PMC5018207

    Chen S, Francioli L, Goodrich J, Collins R, Wang Q, Alfoldi J, Watts N, Vittal C, Gauthier L, Poterba T, Wilson M A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. Biorxiv 2022