Platinum Genomes Track Settings
 
Platinum genome variants   (All Variation tracks)

Display mode:       Reset to defaults

Filters

Exclude variants with Quality/confidence score (QUAL) score less than

VCF configuration help

List subtracks: only selected/visible    all  
hide
 hybrid  Platinum genome hybrid   Schema 
hide
 NA12877  Platinum genome variant NA12877   Schema 
hide
 NA12878  Platinum genome variant NA12878   Schema 
Source data version: Release 2017-1.0

Description

These tracks show high-confidence "Platinum Genome" variant calls for two individuals, NA12877 and NA12878, part of a sequenced 17 member pedigree for family number 1463, from the Centre d'Etude du Polymorphisme Humain (CEPH). The hybrid track displays a merging of the NA12878 results with variant calls produced by Genome in a Bottle, discussed further below. CEPH is an international genetic research center that provides a resource of immortalized cell cultures used to map genetic markers, and pedigree 1463 represents a family lineage from Utah of four grandparents, two parents, and 11 children. The whole pedigree was sequenced to 50x depth on a HiSeq 2000 Illumina system, which is considered a platinum standard, where platinum refers to the quality and completeness of the resulting assembly, such as providing full chromosome scaffolds with phasing and haplotypes resolved across the entire genome.

This figure depicts the pedigree of the family sequenced for this study, where the ID for each sample is defined by adding the prefix NA128 to each numbered individual, so that 77 = NA12877 and 78 = NA12878, corresponding to the VCF tracks available in this track set. The dark orange individuals indicate sequences used in the analysis methods, whereas the blue represent the founder generations (grandparents), which were also sequenced and used in validation steps. The genomes of the parent-child trio on the top right side, 91-92-78, were also sequenced during Phase I of the 1000 Genomes Project.

These tracks represent a comprehensive genome-wide set of phased small variants that have been validated to high confidence. Sequencing and phasing a larger pedigree, beyond the two parents and one child, increases the ability to detect errors and assess the accuracy of more of the variants compared to a standard trio analysis. The genetic inheritance data enables creating a more comprehensive catalog of "platinum variants" that reflects both high accuracy and completeness. These results are significant as a comprehensive set of valid single-nucleotide variants (SNVs) and insertions and deletions (indels), in both the easy and difficult parts of the genome, provides a vital resource for software developers creating the next generation of variant callers, because these are the areas where the current methods most need training data to improve their methods. Since every one of the variants in this catalog is phased, this data set provides a resource to better assess emerging technologies designed to generate valid phasing information. To generate the calls, six analysis pipelines to call SNVs and indels were used and merged into one catalog, where the sensitivity of the genetic inheritance aided to detect genotyping errors and maximize the chance of only including true variants, that might otherwise be removed by suboptimal filtering. Read more about the detailed methods in the referenced paper, further describing this variant catalog of 4.7 million SNVs plus 0.7 million small (1-50 bp) indels, that are all consistent with the pattern of inheritance in the parents and 11 children of this pedigree.

The hybrid track in this set extends the characterization of NA12878 by incorporating high confidence calls produced by Genome in a Bottle analysis. The resulting merged files contain more comprehensive coverage of variation than either set independently, for instance, the hg19 version contains over 80,000 more indels than either input set. Read more about the hybrid methods at the following link: https://github.com/Illumina/PlatinumGenomes/wiki/Hybrid-truthset

Data Access

The VCF files for this track can be obtained from the download server: https://hgdownload.soe.ucsc.edu/gbdb/hg38/platinumGenomes/.
These files were obtained from the Platinum genomes source archive: https://s3.eu-central-1.amazonaws.com/platinum-genomes/2017-1.0/ReleaseNotes.txt.

Reference

Eberle MA, Fritzilas E, Krusche P, Källberg M, Moore BL, Bekritsky MA, Iqbal Z, Chuang HY, Humphray SJ, Halpern AL et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 2017 Jan;27(1):157-164. PMID: 27903644; PMC: PMC5204340