Search CORE

19 research outputs found

Comparison of phasing strategies for whole human genomes

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date: 01/04/2018
Field of study

<div>Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.</div

Directory of Open Access Journals

FigShare

Comparing the genomic location of switch errors across phasing approaches.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

Comparing the genomic location of switch errors across phasing approaches.</p

FigShare

Switch error rates across phasing strategies as a function of minor allele frequency.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

(A) Laboratory-based phasing, (B) Read-based and majority voting, (C) Population-based phasing, (D) Hybrid population and read-based, and (E) Hybrid population and familial data from parental genotype.</p

FigShare

Phasing accuracy of SHAPEIT approaches and the choice of reference panels used.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

(a) Effect of population supergroups on phasing accuracy. Five supergroups of the same size (n = 347) were collected from the 1000GP and used as the reference panel for SHAPEIT (no read) phasing, or together with Illumina or PacBio reads for the NA12878 individual. The best SER was achieved by EUR, to which the NA12878 individual belongs. (b) Effect of population subgroups on phasing accuracy. Population subgroups of the same size (n = 85) were collected from the 1000GP, EUR, and each of five subpopulations in EUR and used as the reference panel for SHAPEIT phasing of the NA12878 individual. No major improvement on SER was observed among EUR and its 5 subgroups including EUR/CEU to which the individual NA12878 belongs. (c) Effect on phasing accuracy as SER as a function of reference panel size, compared with the inclusion of WGS reads or familial information from parental genotype. Reference panels containing up to 502 individuals from the 1000GP EUR group or 23k individuals from HRC were used as the population background for SHAPEIT phasing of the NA12878 individual.</p

FigShare

Performance summary of population-based phasing approaches supplemented with sequence reads and/or parental genotype information.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

Performance summary of population-based phasing approaches supplemented with sequence reads and/or parental genotype information.</p

FigShare

Phasing accuracy and haplotype diversity.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

Switch error rates for various strategies are shown as a function of haplotype diversity of a reference population based on the 1000GP reference panel.</p

FigShare

Phasing accuracy of disease-associated genes for the reference individual NA12878.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

Phasing accuracy of disease-associated genes for the reference individual NA12878.</p

FigShare

Phasing performance comparison based on pairwise SNV haplotype assignment.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

(A) Phasing accuracy. Probability that a pair of SNVs on the same phasing block is correctly phased with respect to each other as a function of the distance between the pair. (B) Phasing yield. Probability that a pair of SNVs are phased in the same phasing block as a function of the distance between the pair.</p

FigShare

Phasing accuracy and SNV density.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

(A) Basic phasing approaches. (B) SHAPEIT phasing supplemented with reference panel, sequence read, or parental genotype information. Switch error rates for various phasing strategies are shown as a function of distance between a heterozygous site and its upstream phased site.</p

FigShare

Performance summary across experimental-, population-, read-based, and majority vote phasing approaches.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

Performance summary across experimental-, population-, read-based, and majority vote phasing approaches.</p

FigShare