Search CORE

59 research outputs found

Comparison of phasing strategies for whole human genomes

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date: 01/04/2018
Field of study

<div><p>Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.</p></div

Directory of Open Access Journals

The Francis Crick Institute

Scripps Genome ADVISER: Annotation and Distributed Variant Interpretation SERver

Author: Ali Torkamani (237060)
Galina A. Erikson (694595)
Nicholas J. Schork (176346)
Phillip H. Pham (694593)
William J. Shipman (694594)
Publication venue
Publication date: 23/02/2015
Field of study

<div><p>Interpretation of human genomes is a major challenge. We present the Scripps Genome ADVISER (SG-ADVISER) suite, which aims to fill the gap between data generation and genome interpretation by performing holistic, in-depth, annotations and functional predictions on all variant types and effects. The SG-ADVISER suite includes a de-identification tool, a variant annotation web-server, and a user interface for inheritance and annotation-based filtration. SG-ADVISER allows users with no bioinformatics expertise to manipulate large volumes of variant data with ease – without the need to download large reference databases, install software, or use a command line interface. SG-ADVISER is freely available at genomics.scripps.edu/ADVISER.</p></div

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Phasing accuracy of SHAPEIT approaches and the choice of reference panels used.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

<p>(a) Effect of population supergroups on phasing accuracy. Five supergroups of the same size (n = 347) were collected from the 1000GP and used as the reference panel for SHAPEIT (no read) phasing, or together with Illumina or PacBio reads for the NA12878 individual. The best SER was achieved by EUR, to which the NA12878 individual belongs. (b) Effect of population subgroups on phasing accuracy. Population subgroups of the same size (n = 85) were collected from the 1000GP, EUR, and each of five subpopulations in EUR and used as the reference panel for SHAPEIT phasing of the NA12878 individual. No major improvement on SER was observed among EUR and its 5 subgroups including EUR/CEU to which the individual NA12878 belongs. (c) Effect on phasing accuracy as SER as a function of reference panel size, compared with the inclusion of WGS reads or familial information from parental genotype. Reference panels containing up to 502 individuals from the 1000GP EUR group or 23k individuals from HRC were used as the population background for SHAPEIT phasing of the NA12878 individual.</p

The Francis Crick Institute

Comparing the genomic location of switch errors across phasing approaches.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

<p>Comparing the genomic location of switch errors across phasing approaches.</p

The Francis Crick Institute

Summary of phasing performance as shown by switch error rates, Quality Adjusted N50 (QAN50), and the percentage of fully phased genes.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

<p>(A) All phasing methods tested, and (B) SHAPEIT phasing making use of different combinations of reference panels, WGS/RNA read data, and parental genotype.</p

The Francis Crick Institute

Switch error rates across phasing strategies as a function of minor allele frequency.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

<p>(A) Laboratory-based phasing, (B) Read-based and majority voting, (C) Population-based phasing, (D) Hybrid population and read-based, and (E) Hybrid population and familial data from parental genotype.</p

The Francis Crick Institute

Phasing accuracy and SNV density.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

<p>(A) Basic phasing approaches. (B) SHAPEIT phasing supplemented with reference panel, sequence read, or parental genotype information. Switch error rates for various phasing strategies are shown as a function of distance between a heterozygous site and its upstream phased site.</p

The Francis Crick Institute

Performance summary across experimental-, population-, read-based, and majority vote phasing approaches.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

<p>Performance summary across experimental-, population-, read-based, and majority vote phasing approaches.</p

The Francis Crick Institute

Phasing performance comparison based on pairwise SNV haplotype assignment.

Author: Agnes P. Chan (61280)
Amalio Telenti (25361)
Ewen Kirkness (2619)
Nicholas J. Schork (176346)
Yongwook Choi (128530)
Publication venue
Publication date
Field of study

<p>(A) Phasing accuracy. Probability that a pair of SNVs on the same phasing block is correctly phased with respect to each other as a function of the distance between the pair. (B) Phasing yield. Probability that a pair of SNVs are phased in the same phasing block as a function of the distance between the pair.</p

The Francis Crick Institute