Search CORE

155 research outputs found

Joint Haplotype Assembly and Genotype Calling via Sequential Monte Carlo Algorithm

Author: Ahn Soyeon
Vikalo Haris
Publication venue
Publication date: 01/07/2015
Field of study

Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly. Results: We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed. Conclusions: The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes.National Science Foundation CCF-1320273Electrical and Computer Engineerin

Springer - Publisher Connector

PubMed Central

Texas ScholarWorks

Algorithmic approaches for the single individual haplotyping problem

Author: Lancia Giuseppe
Publication venue: 'EDP Sciences'
Publication date: 01/01/2016
Field of study

Since its introduction in 2001, the Single Individual Haplotyping problem has received an ever-increasing attention from the scientific community. In this paper we survey, in the form of an annotated bibliography, the developments in the study of the problem from its origin until our days

Archivio istituzionale della ricerca - Università degli Studi di Udine

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm

Author: A Panconesi
D Aguiar
D Aguiar
D He
F Deng
F Geraci
G Lancia
GR Abecasis
H Matsumoto
Haris Vikalo
J Duitama
JH Kim
K-C Liang
KC Liang
LM Li
M Xie
MR Hoehe
MS Arulampalam
MS Bayzid
R Cilibrasi
R Lippert
R Nielsen
R Schwartz
RA Gibbs
RS Wang
S Levy
Soyeon Ahn
V Bansal
V Bansal
Y Wang
YY Zhao
Z Chen
ZZ Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction

Author
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem

Author: Altshuler
Bansal
Chen
Daly
F. Geraci
Frazer
Metzker
Via
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Single nucleotide polymorphisms are the most common form of variation in human DNA, and are involved in many research fields, from molecular biology to medical therapy. The technological opportunity to deal with long DNA sequences using shotgun sequencing has raised the problem of fragment recombination. In this regard, Single Individual Haplotyping (SIH) problem has received considerable attention over the past few years

Crossref

PubMed Central

PUblication MAnagement

Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison

Author: Zhao Zhiyu
Publication venue: ScholarWorks@UNO
Publication date: 07/08/2008
Field of study

Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented

University of New Orleans

Theory and Algorithms for the Haplotype Assembly Problem

Author
Publication venue: 'International Press of Boston'
Publication date: 01/01/2010
Field of study

Crossref

Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques

Author: Duitama Jorge
Hoehe Margret R.
Huebsch Thomas
McEwen Gayle K.
Palczewski Stefanie
Schulz Sabrina
Suk Eun-Kyung
Verstrepen Kevin
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics

Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

Author: Tork Bassam A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2013
Field of study

The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods

CiteSeerX

ScholarWorks @ Georgia State University

Recommended from our members

Computational methods for understanding genetic variations from next generation sequencing data

Author: Ahn Soyeon, Ph. D.
Publication venue
Publication date: 13/09/2018
Field of study

Studies of human genetic variation reveal critical information about genetic and complex diseases such as cancer, diabetes and heart disease, ultimately leading towards improvements in health and quality of life. Moreover, understanding genetic variations in viral population is of utmost importance to virologists and helps in search for vaccines. Next-generation sequencing technology is capable of acquiring massive amounts of data that can provide insight into the structure of diverse sets of genomic sequences. However, reconstructing heterogeneous sequences is computationally challenging due to the large dimension of the problem and limitations of the sequencing technology.This dissertation is focused on algorithms and analysis for two problems in which we seek to characterize genetic variations: (1) haplotype reconstruction for a single individual, so-called single individual haplotyping (SIH) or haplotype assembly problem, and (2) reconstruction of viral population, the so-called quasispecies reconstruction (QSR) problem. For the SIH problem, we have developed a method that relies on a probabilistic model of the data and employs the sequential Monte Carlo (SMC) algorithm to jointly determine type of variation (i.e., perform genotype calling) and assemble haplotypes. For the QSR problem, we have developed two algorithms. The first algorithm combines agglomerative hierarchical clustering and Bayesian inference to reconstruct quasispecies characterized by low diversity. The second algorithm utilizes tensor factorization framework with successive data removal to reconstruct quasispecies characterized by highly uneven frequencies of its components. Both algorithms outperform existing methods in both benchmarking tests and real data.Electrical and Computer Engineerin

Texas ScholarWorks