Database: hg38 Primary Table: gnomadGenomesVariantsV3_1 Data last updated: 2021-01-13 Big Bed File Download:/gbdb/hg38/gnomAD/v3.1/variants/genomes.bb Item Count: 798,965,229 The data is stored in the binary BigBed format.
Format description: Browser extensible data (9 fields), plus gnomAD related fields.
field
example
description
chrom
chr1
Chromosome (or contig, scaffold, etc.)
chromStart
165970949
Start position in chromosome
chromEnd
165970950
End position in chromosome
name
chr1:165970949-165970950 (G/C)
Name of item
score
0
Score from 0-1000
strand
.
+ or -
thickStart
165970949
Start of where display should be thick (start codon)
thickEnd
165970950
End of where display should be thick (stop codon)
reserved
95,95,95
Used as itemRgb as of 2004-11-22
ref
G
Reference Sequence
alt
C
Alternate Sequence
FILTER
AC0,AS_VQSR
FILTER tags from VCF
AC
0
Allele Count
AN
133970
Allele Number
AF
0.00000
Allele Frequency
faf95
0.00000
Filtering allele frequency (using Poisson 95% CI) for samples
nhomalt
0
Count of homozygous individuals in samples
rsId
dbSnp rsID
genes
List of genes affected by variant
annot
other
Annotation type: pLoF, missense, synonymous, or other
variation_type
intergenic_variant
Variant type(s)
hgvsc
HGVS c. terms
hgvsp
HGVS p. terms
popmax
N/A
Population with maximum AF
AC_popmax
N/A
Allele count in the population with the maximum AF
AN_popmax
N/A
Total number of alleles in the population with the maximum AF
AF_popmax
N/A
Maximum allele frequency across populations (excluding samples of Ashkenazi, Finnish, and indeterminate ancestry)
AC_afr
0
Alternate allele count for samples of African-American/African ancestry
AN_afr
35480
Total number of alleles in samples of African-American/African ancestry
AF_afr
0.00000
Alternate allele frequency in samples of African-American/African ancestry
nhomalt_afr
0
Count of homozygous individuals in male samples of African-American/African ancestry
AC_ami
0
Alternate allele count for samples of Amish ancestry
AN_ami
862
Total number of alleles in samples of Amish ancestry
AF_ami
0.00000
Alternate allele frequency in samples of Amish ancestry
nhomalt_ami
0
Count of homozygous individuals in samples of Amish ancestry
AC_amr
0
Alternate allele count for samples of Latino/Admixed American ancestry
AN_amr
13226
Total number of alleles in samples of Latino/Admixed American ancestry
AF_amr
0.00000
Alternate allele frequency in samples of Latino/Admixed American ancestry
nhomalt_amr
0
Count of homozygous individuals in samples of Latino/Admixed American ancestry
AC_asj
0
Alternate allele count for samples of Ashkenazi Jewish ancestry
AN_asj
3262
Total number of alleles in samples of Ashkenazi Jewish ancestry
AF_asj
0.00000
Alternate allele frequency in samples of Ashkenazi Jewish ancestry
nhomalt_asj
0
Count of homozygous individuals in samples of Ashkenazi Jewish ancestry
AC_eas
0
Alternate allele count for samples of East Asian ancestry
AN_eas
4626
Total number of alleles in samples of East Asian ancestry
AF_eas
0.00000
Alternate allele frequency in samples of East Asian ancestry
nhomalt_eas
0
Count of homozygous individuals in samples of East Asian ancestry
AC_fin
0
Alternate allele count for samples of Finnish ancestry
AN_fin
7526
Total number of alleles in samples of Finnish ancestry
AF_fin
0.00000
Alternate allele frequency in samples of Finnish ancestry
nhomalt_fin
0
Count of homozygous individuals in samples of Finnish ancestry
AC_mid
0
Alternate allele count for samples of Middle Eastern ancestry
AN_mid
292
Total number of alleles in samples of Middle Eastern ancestry
AF_mid
0.00000
Alternate allele frequency in samples of Middle Eastern ancestry
nhomalt_mid
0
Count of homozygous individuals in samples of Middle Eastern ancestry
AC_nfe
0
Alternate allele count for samples of Non-Finnish European ancestry
AN_nfe
63000
Total number of alleles in samples of Non-Finnish European ancestry
AF_nfe
0.00000
Alternate allele frequency in samples of Non-Finnish European ancestry
nhomalt_nfe
0
Count of homozygous individuals in samples of Non-Finnish European ancestry
AC_sas
0
Alternate allele count for samples of South Asian ancestry
AN_sas
3896
Total number of alleles in samples of South Asian ancestry
AF_sas
0.00000
Alternate allele frequency in samples of South Asian ancestry
nhomalt_sas
0
Count of homozygous individuals in samples of South Asian ancestry
AC_oth
0
Alternate allele count for samples of Other ancestry
AN_oth
1800
Total number of alleles in samples of Other ancestry
AF_oth
0.00000
Alternate allele frequency in samples of Other ancestry
nhomalt_oth
0
Count of homozygous individuals in samples of Other ancestry
cadd_phred
2.99500
Cadd Phred-like scores ('scaled C-scores') ranging from 1 to 99, based on the rank of each variant relative to all possible 8.6 billion substitutions in the human reference genome. Larger values are more deleterious
revel_score
N/A
dbNSFP's Revel score from 0 to 1. Variants with higher scores are predicted to be more likely to be deleterious
splice_ai_max_ds
N/A
Illumina's SpliceAI max delta score. Interpreted as the probability of the variant being splice-altering
splice_ai_consequence
N/A
The consequence term associated with the max delta score in 'splice_ai_max_ds'
primate_ai_score
N/A
PrimateAI's deleteriousness score from 0 (less deleterious) to 1 (more deleterious)
_startPos
165970950
Unshifted chromStart position from VCF for link outs
The gnomAD v3.1 track shows variants from 76,156 whole genomes (and no exomes), all mapped to the
GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous
v3 release. For more detailed information on gnomAD v3.1, see the related blog post.
The gnomAD v3.1.1 track contains the same underlying data as v3.1, but
with minor corrections to the VEP annotations and dbSNP rsIDs. On the UCSC side, we have now
included the mitochondrial chromosome data that was released as part of gnomAD v3.1 (but after
the UCSC version of the track was released). For more information about gnomAD v3.1.1, please
see the related
changelog.
GnomAD Genome Mutational Constraint is based on v3.1.2 and is available only on hg38.
It shows the reduced variation caused by purifying
natural selection. This is similar to negative selection on loss-of-function
(LoF) for genes, but can be calculated for non-coding regions too.
Positive values are red and reflect stronger mutation constraint (and less variation), indicating
higher natural selection pressure in a region. Negative values are green and
reflect lower mutation constraint
(and more variation), indicating less selection pressure and less functional effect.
Briefly, for any 1kbp window in
the genome, a model based on trinucleotide sequence context, base-level
methylation, and regional genomic features predicts expected number of mutations,
and compares this number to the observed number of mutations using a Z-score (see preprint
in the Reference section for details). The chrX scores were added as received from the authors,
as there are no de novo mutation data available on chrX (for estimating the effects of regional
genomic features on mutation rates), they are more speculative than the ones on the autosomes.
The gnomAD Predicted Constraint Metrics track contains metrics of pathogenicity per-gene as
predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various
classes of mutation. This includes data on both the gene and transcript level.
The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to
the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate
from 141,456 unrelated individuals sequenced as part of various population-genetic and
disease-specific studies
collected by the Genome Aggregation Database (gnomAD), release 2.1.1.
Raw data from all studies have been reprocessed through a unified pipeline and jointly
variant-called to increase consistency across projects. For more information on the processing
pipeline and population annotations, see the following blog post
and the 2.1.1 README.
gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the
GRCh38/hg38 lift-over provided by gnomAD on their downloads site.
For questions on the gnomAD data, also see the gnomAD FAQ.
The gnomAD v3.1.1 track version follows the same conventions and configuration as the v3.1 track,
except as noted below.
There are additional FILTER field filters: AS_VQSR, indel_stack (chrM only), and npg (chrM only).
Where possible, variants overlapping multiple transcripts/genes have been collapsed into one
variant, with additional information available on the details page, which has roughly halved the
number of items in the bigBed.
The bigBed has been split into two files, one with the information necessary for the track
display, and one with the information necessary for the details page. For more information on
this data format, please see the Data Access section below.
The VEP annotation is shown as a table instead of spread across multiple fields.
Intergenic variants have not been pre-filtered.
gnomAD v3.1
By default, a maximum of 50,000 variants can be displayed at a time (before applying the filters
described below), before the track switches to dense display mode.
Mouse hover on an item will display many details about each variant, including the affected gene(s),
the variant type, and annotation (missense, synonymous, etc).
Clicking on an item will display additional details on the variant, including a population frequency
table showing allele count in each sub-population.
Following the conventions on the gnomAD browser, items are shaded according to their Annotation
type:
pLoF
Missense
Synonymous
Other
Label Options
To maintain consistency with the gnomAD website, variants are by default labeled according
to their chromosomal start position followed by the reference and alternate alleles,
for example "chr1-1234-T-CAG". dbSNP rsID's are also available as an additional
label, if the variant is present in dbSnp.
Filtering Options
Three filters are available for these tracks:
FILTER: Used to exclude/include variants that failed Random Forest
(RF), Inbreeding Coefficient (Inbreeding Coeff), or Allele Count (AC0) filters. The
PASS option is used to include/exclude variants that pass all of the RF,
InbreedingCoeff, and AC0 filters, as denoted in the original VCF.
Annotation type: Used to exclude/include variants that are annotated as
Probability Loss of Function (pLoF), Missense, Synonymous, or Other, as
annotated by VEP version 85 (GENCODE v19).
Variant Type: Used to exclude/include variants according to the type of
variation, as annotated by VEP v85.
There is one additional configurable filter on the minimum minor allele frequency.
gnomAD v2.1.1
The gnomAD v2.1.1 track follows the standard display and configuration options available for
VCF tracks, briefly explained below.
In mode, a vertical line is drawn at the position of
each variant.
In mode, "ref" and "alt" alleles are
displayed to the left of a vertical line with colored portions corresponding to allele counts.
Hovering the mouse pointer over a variant pops up a display of alleles and counts.
Filtering Options
Four filters are available for these tracks, the same as the underlying VCF:
AC0: Allele Count 0 after filtering out low confidence genotypes (GQ < 20; DP < 10; and AB < 0.2 for het calls))
InbreedingCoeff: Inbreeding Coefficient < -0.3
RF: Used to exclude/include variants that failed Random Forest filtering thresholds of 0.055272738028512555, 0.20641025579497013 (probabilities of being a true positive variant) for SNPs, indels)
Pass: Variant passes all 3 filters
There are two additional filters available, one for the minimum minor allele frequency, and a configurable filter on the QUAL score.
The raw data can be explored interactively with the
Table Browser, or the Data Integrator. For
automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that
can be downloaded from our download server, subject
to the conditions set forth by the gnomAD consortium (see below). Variant VCFs can be found in the
vcf/ subdirectory. The
v3.1 and
v3.1.1 variants can
be found in a special directory as they have been transformed from the underlying VCF.
For the v3.1.1 variants in particular, the underlying bigBed only contains enough information
necessary to use the track in the browser. The extra data like VEP annotations and CADD scores are
available in the same directory
as the bigBed but in the files gnomad.v3.1.1.details.tab.gz and
gnomad.v3.1.1.details.tab.gz.gzi. The gnomad.v3.1.1.details.tab.gz contains the gzip
compressed extra data in JSON format, and the .gzi file is available to speed searching of
this data. Each variant has an associated md5sum in the name field of the bigBed which can be
used along with the _dataOffset and _dataLen fields to get the associated external data, as show
below:
# find item of interest:
bigBedToBed genomes.bb stdout | head -4 | tail -1
chr1 12416 12417 854246d79dc5d02dcdbd5f5438542b6e [..omitted for brevity..] chr1-12417-G-A 67293 902
# use the final two fields, _dataOffset and _dataLen (add one to _dataLen to include a newline), to get the extra data:
bgzip -b 67293 -s 903 gnomad.v3.1.1.details.tab.gz
854246d79dc5d02dcdbd5f5438542b6e {"DDX11L1": {"cons": ["non_coding_transcript_variant", [..omitted for brevity..]
The mutational constraints score was updated in October 2022 from a previous,
now deprecated, pre-publication version. The old version can be found in our
archive
directory on the download server. It can be loaded by copying the URL into
our "Custom tracks" input box.