Search CORE

1,746 research outputs found

On the Complexity of the Single Individual SNP Haplotyping Problem

Author: Cilibrasi Rudi
Kelk Steven
Tromp John
van Iersel Leo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present several new results pertaining to haplotyping. These results concern the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype fragments. We consider the complexity of the problems Minimum Error Correction (MEC) and Longest Haplotype Reconstruction (LHR) for different restrictions on the input data. Specifically, we look at the gapless case, where every row of the input corresponds to a gapless haplotype-fragment, and the 1-gap case, where at most one gap per fragment is allowed. We prove that MEC is APX-hard in the 1-gap case and still NP-hard in the gapless case. In addition, we question earlier claims that MEC is NP-hard even when the input matrix is restricted to being completely binary. Concerning LHR, we show that this problem is NP-hard and APX-hard in the 1-gap case (and thus also in the general case), but is polynomial time solvable in the gapless case.Comment: 26 pages. Related to the WABI2005 paper, "On the Complexity of Several Haplotyping Problems", but with more/different results. This papers has just been submitted to the IEEE/ACM Transactions on Computational Biology and Bioinformatics and we are awaiting a decision on acceptance. It differs from the mid-August version of this paper because here we prove that 1-gap LHR is APX-hard. (In the earlier version of the paper we could prove only that it was NP-hard.

arXiv.org e-Print Archive

CiteSeerX

Maastricht University Research Portal

Repository TU/e

Crossref

CWI's Institutional Repository

Pure OAI Repository

International Migration, Integration and Social Cohesion online publications

NGS Based Haplotype Assembly Using Matrix Completion

Author: Kahaei MH
Majidian Sina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

We apply matrix completion methods for haplotype assembly from NGS reads to develop the new HapSVT, HapNuc, and HapOPT algorithms. This is performed by applying a mathematical model to convert the reads to an incomplete matrix and estimating unknown components. This process is followed by quantizing and decoding the completed matrix in order to estimate haplotypes. These algorithms are compared to the state-of-the-art algorithms using simulated data as well as the real fosmid data. It is shown that the SNP missing rate and the haplotype block length of the proposed HapOPT are better than those of HapCUT2 with comparable accuracy in terms of reconstruction rate and switch error rate. A program implementing the proposed algorithms in MATLAB is freely available at https://github.com/smajidian/HapMC

arXiv.org e-Print Archive

Directory of Open Access Journals

Optimal algorithms for haplotype assembly from whole-genome sequence data

Author: A. Choi
A. Darwiche
Bansal
Browning
D. He
E. Eskin
Frazer
K. Pipatsrisawat
Levy
Lippert
Marchini
Stephens
Wheeler
Publication venue: Oxford University Press
Publication date: 01/06/2010
Field of study

Motivation: Haplotype inference is an important step for many types of analyses of genetic variation in the human genome. Traditional approaches for obtaining haplotypes involve collecting genotype information from a population of individuals and then applying a haplotype inference algorithm. The development of high-throughput sequencing technologies allows for an alternative strategy to obtain haplotypes by combining sequence fragments. The problem of ‘haplotype assembly’ is the problem of assembling the two haplotypes for a chromosome given the collection of such fragments, or reads, and their locations in the haplotypes, which are pre-determined by mapping the reads to a reference genome. Errors in reads significantly increase the difficulty of the problem and it has been shown that the problem is NP-hard even for reads of length 2. Existing greedy and stochastic algorithms are not guaranteed to find the optimal solutions for the haplotype assembly problem

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Spotlight on islands.On the origin and diversification of an ancient lineage of the Italian wall lizard Podarcis siculus in the western Pontine Islands

Author: Castiglia Riccardo
DE SIMONE Emanuela
Havenstein Katja
Milana Valentina
Ripa Chiara
Senczuk Gabriele
Tiedemann Ralph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Groups of proximate continental islands may conceal more tangled phylogeographic patterns than oceanic archipelagos as a consequence of repeated sea level changes, which allow populations to experience gene flow during periods of low sea level stands and isolation by vicariant mechanisms during periods of high sea level stands. Here, we describe for the first time an ancient and diverging lineage of the Italian wall lizard Podarcis siculus from the western Pontine Islands. We used nuclear and mitochondrial DNA sequences of 156 individuals with the aim of unraveling their phylogenetic position, while microsatellite loci were used to test several a priori insular biogeographic models of migration with empirical data. Our results suggest that the western Pontine populations colonized the islands early during their Pliocene volcanic formation, while populations from the eastern Pontine Islands seem to have been introduced recently. The inter-island genetic makeup indicates an important role of historical migration, probably due to glacial land bridges connecting islands followed by a recent vicariant mechanism of isolation. Moreover, the most supported migration model predicted higher gene flow among islands which are geographically arranged in parallel. Considering the threatened status of small insular endemic populations, we suggest this new evolutionarily independent unit be given priority in conservation efforts

Università degli Studi del Molise: IRIS

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza

The Parameterized Complexity of the Shared Center Problem

Author: B. Ma
D. Marx
I. Leykin
J. Gramm
J. Li
K. Doi
L. Wang
R. Impagliazzo
R.G. Downey
W. Ma
Z.-Z. Chen
Z.-Z. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Full-length de novo viral quasispecies assembly through variation graph construction

Author: Baaijens J.A. (Jasmijn)
Köster J. (Johannes)
Roest B. (Bastiaan) van der
Schönhuth A. (Alexander)
Stougie L. (Leen)
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/12/2019
Field of study

CWI's Institutional Repository

On the design of clone-based haplotyping

Author
Publication venue: BioMed Central
Publication date: 12/09/2013
Field of study

Springer - Publisher Connector

Understanding the accuracy of statistical haplotype inference with sequence data of known phase

Author: Barrett
Carlson
Chung
Clark
Clark
Clark
Clayton
Douglas
Excoffier
Excoffier
Gabriel
Halperin
Hawley
Hill
Hudson
Johnson
Kidd
Kimmel
Kukita
Lin
Long
Marchini
Marth
Martinez-Arias
Mateu
Morris
Myers
Niu
Niu
Patil
Raymond
Reich
Sabeti
Salem
Schaid
Scheet
Schouten
Stephens
Stephens
Stephens
Stephens
Tishkoff
Weir
Yan
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

Statistical methods for haplotype inference from multi-site genotypes of unrelated individuals have important application in association studies and population genetics. Understanding the factors that affect the accuracy of this inference is important, but their assessment has been restricted by the limited availability of biological data with known phase. We created hybrid cell lines monosomic for human chromosome 19 and produced single-chromosome complete sequences of a 48 kb genomic region in 39 individuals of African American (AA) and European American (EA) origin. We employ these phase-known genotypes and coalescent simulations to assess the accuracy of statistical haplotype reconstruction by several algorithms. Accuracy of phase inference was considerably low in our biological data even for regions as short as 25–50 kb, suggesting that caution is needed when analyzing reconstructed haplotypes. Moreover, the reliability of estimated confidence in phase inference is not high enough to allow for a reliable incorporation of site-specific uncertainty information in subsequent analyses. We show that, in samples of certain mixed ancestry (AA and EA populations), the most accurate haplotypes are probably obtained when increasing sample size by considering the largest, pooled sample, despite the hypothetical problems associated with pooling across those heterogeneous samples. Strategies to improve confidence in reconstructed haplotypes, and realistic alternatives to the analysis of inferred haplotypes, are discussed. Genet. Epidemiol . © 2007 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/57366/1/20185_ftp.pd

Deep Blue Documents at the University of Michigan

Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison

Author: Zhao Zhiyu
Publication venue: ScholarWorks@UNO
Publication date: 07/08/2008
Field of study

Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented

University of New Orleans

Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time

Author: Cazaux Bastien
Kosolobov Dmitry
Norri Tuukka
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Publication date: 01/01/2018
Field of study

Given a threshold L and a set R = {R_1, ..., R_m} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1,n] into set P of disjoint segments such that each segment [a,b] in P has length at least L and the number d(a,b)=|{R_i[a,b] : 1 <= i <= m}| of distinct substrings at segment [a,b] is minimized over [a,b] in P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b) : [a,b] in P} founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn^2). This improvement enables to exploit the algorithm on a pan-genomic setting of input strings being aligned haplotype sequences of complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling. We implemented the new algorithm and give some experimental evidence on the practicality of the approach on this pan-genomic setting

Dagstuhl Research Online Publication Server