Search CORE

3,505 research outputs found

On the fixed parameter tractability and approximability of the minimum error correction problem

Author: B Browning
B Reed
D Aguiar
D He
G Lancia
J Guo
M Patterson
MR Garey
N Garg
P Bonizzoni
P Fouilhoux
R Cilibrasi
R Dondi
R Lippert
R Ostrovsky
V Bansal
Y Jiao
Y Pirola
ZZ Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum Error Correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. By using novel combinatorial properties of MEC instances, we are able to provide new results on the fixed-parameter tractability and approximability of MEC. In particular, we show that MEC is in FPT when parameterized by the number of corrections, and, on “gapless” instances, it is in FPT also when parameterized by the length of the fragments, whereas the result known in literature forces the reconstruction of complementary haplotypes. Then, we show that MEC cannot be approximated within any constant factor while it is approximable within factor O(log nm) where nm is the size of the input. Finally, we provide a practical 2-approximation algorithm for the Binary MEC, a variant of MEC that has been applied in the framework of clustering binary data

Crossref

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

On the Complexity of the Single Individual SNP Haplotyping Problem

Author: Cilibrasi Rudi
Kelk Steven
Tromp John
van Iersel Leo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present several new results pertaining to haplotyping. These results concern the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype fragments. We consider the complexity of the problems Minimum Error Correction (MEC) and Longest Haplotype Reconstruction (LHR) for different restrictions on the input data. Specifically, we look at the gapless case, where every row of the input corresponds to a gapless haplotype-fragment, and the 1-gap case, where at most one gap per fragment is allowed. We prove that MEC is APX-hard in the 1-gap case and still NP-hard in the gapless case. In addition, we question earlier claims that MEC is NP-hard even when the input matrix is restricted to being completely binary. Concerning LHR, we show that this problem is NP-hard and APX-hard in the 1-gap case (and thus also in the general case), but is polynomial time solvable in the gapless case.Comment: 26 pages. Related to the WABI2005 paper, "On the Complexity of Several Haplotyping Problems", but with more/different results. This papers has just been submitted to the IEEE/ACM Transactions on Computational Biology and Bioinformatics and we are awaiting a decision on acceptance. It differs from the mid-August version of this paper because here we prove that 1-gap LHR is APX-hard. (In the earlier version of the paper we could prove only that it was NP-hard.

arXiv.org e-Print Archive

CiteSeerX

Maastricht University Research Portal

Repository TU/e

Crossref

CWI's Institutional Repository

Pure OAI Repository

International Migration, Integration and Social Cohesion online publications

Haplotype Assembly: An Information Theoretic View

Author: Si Hongbo
Vikalo Haris
Vishwanath Sriram
Publication venue
Publication date: 11/05/2014
Field of study

This paper studies the haplotype assembly problem from an information theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the corresponding positions on the other chromosome in a homologous pair. Information about the order of bases in a genome is readily inferred using short reads provided by high-throughput DNA sequencing technologies. In this paper, the recovery of the target pair of haplotype sequences using short reads is rephrased as a joint source-channel coding problem. Two messages, representing haplotypes and chromosome memberships of reads, are encoded and transmitted over a channel with erasures and errors, where the channel model reflects salient features of high-throughput sequencing. The focus of this paper is on the required number of reads for reliable haplotype reconstruction, and both the necessary and sufficient conditions are presented with order-wise optimal bounds.Comment: 30 pages, 5 figures, 1 tabel, journa

arXiv.org e-Print Archive

Crossref

Haplotype-aware Diplotyping from Noisy Long Reads

Author: Ebler J.
Haukness M.
Marschall T.
Paten B.
Pesout T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

MPG.PuRe

Viral population estimation using pyrosequencing

Author: A Dempster
A Rambaut
AMN Tsibris
B Gaschen
Baback Gharizadeh
C Wang
Chunlin Wang
D O'Meara
DC Douek
E Domingo
E Halperin
EH Simpson
ES Lander
Glenn Tesler
GS Gottlieb
GW Tyson
H Fakhrai-Rad
I Malet
IM Rouzine
J Kececioglu
JE Hopcroft
JF Simons
K Chen
KJ Metzner
L Bacheler
L Doukhan
L Excoffier
Lior Pachter
LR Ford
M Breitbart
M Eigen
M Margulies
M Stephens
MA Nowak
MJ Gonzales
ML Collins
ML Sogin
Mostafa Ronaghi
MT Tammi
N Beerenwinkel
Nicholas Eriksson
Niko Beerenwinkel
P Jenkins
PA Pevzner
R Schmid
R Shankarappa
Robert W. Shafer
RP Dilworth
S Huse
S-Y Rhee
S-Y Rhee
Soo-Yon Rhee
VA Johnson
Yumi Mitsuya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Read-based Phasing of Related Individuals

Author: Garg S.
Marschall T.
Martin M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Motivation: Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information—reads and pedigree—has the potential to deliver results better than each individually. Results: We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual. Availability and Implementation: https://bitbucket.org/whatshap/whatshap Contact: [email protected]

PubMed Central

MPG.PuRe

HapCol: accurate and memory-efficient haplotype assembly from long reads

Author: Gunnar W. Klau
Kuleshov
Nadia Pisanti
Paola Bonizzoni
Patterson
Riccardo Dondi
Simone Zaccaria
Yuri Pirola
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Computational recovery of enzyme haplotypes from a metagenome

Author: Nicholls Samuel
Publication venue
Publication date: 01/01/2018
Field of study

Aberystwyth Research Portal

University of Birmingham Research Portal

Recent advances in inferring viral diversity from high-throughput sequencing data

Author: Beerenwinkel Niko
Posada-Cespedes Susana
Seifert David
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

Repository for Publications and Research Data

Elsevier - Publisher Connector