Search CORE

248 research outputs found

Boosting Haplotype Inference with Local Search

Author: Lynce Ines
Marques-Silva Joao
Prestwich Steve
Publication venue
Publication date: 12/01/2008
Field of study

Abstract. A very challenging problem in the genetics domain is to infer haplotypes from genotypes. This process is expected to identify genes affecting health, disease and response to drugs. One of the approaches to haplotype inference aims to minimise the number of different haplotypes used, and is known as haplotype inference by pure parsimony (HIPP). The HIPP problem is computationally difficult, being NP-hard. Recently, a SAT-based method (SHIPs) has been proposed to solve the HIPP problem. This method iteratively considers an increasing number of haplotypes, starting from an initial lower bound. Hence, one important aspect of SHIPs is the lower bounding procedure, which reduces the number of iterations of the basic algorithm, and also indirectly simplifies the resulting SAT model. This paper describes the use of local search to improve existing lower bounding procedures. The new lower bounding procedure is guaranteed to be as tight as the existing procedures. In practice the new procedure is in most cases considerably tighter, allowing significant improvement of performance on challenging problem instances.

CiteSeerX

Southampton (e-Prints Soton)

A Preprocessing Procedure for Haplotype Inference by Pure Parsimony

Author: Calvo Molinos Borja
Irurozki Ekhine
Lozano Alonso José Antonio
Publication venue
Publication date: 01/01/2010
Field of study

Haplotype data is especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and expensive. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved

Archivo Digital para la Docencia y la Investigación

A Column Generation Approach for Pure Parsimony Haplotyping

Author: Dal Sasso Veronica
De Giovanni Luigi
Publication venue: OASIcs - OpenAccess Series in Informatics. 5th Student Conference on Operational Research (SCOR 2016)
Publication date: 01/01/2016
Field of study

Dagstuhl Research Online Publication Server

A Class Representative Model for Pure Parsimony Haplotyping under Uncertain Data

Author: B Dahlbäck
BV Halldórsson
D Altshuler
D Brown
D Catanzaro
D Catanzaro
D Catanzaro
D Gusfield
D Gusfield
Daniele Catanzaro
G Lancia
G Lancia
GI Bell
H Stefansson
J Marchini
JD Rioux
JP Hugot
K Ozaki
L Nisticó
L Wang
LA Pennacchio
Luciano Porretta
Martine Labbé
P Van Eerdewegh
RR Hudson
S Gretarsdottir
SJJ Dorman
SS Deeb
Thomas Mailund
TIH Consortium
TIH Consortium
VJ Clark
WH Li
WJ Strittmatter
X Lu
XS Zhang
Y Ogura
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al. [1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

DI-fusion

DIAL UCLouvain

Direct maximum parsimony phylogeny reconstruction from genotype data

Author: Fumei Lam
Guy E Blelloch
R Ravi
Russell Schwartz
Sridhar Srinath
Srinath aiidhar
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. Results In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes. Conclusion Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Discrete Algorithms for Analysis of Genotype Data

Author: Brinza Dumitru
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2007
Field of study

Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. The optimization formulations for searching disease-associated risk/resistant factors and predicting disease susceptibility for given case-control study have been introduced. Several discrete methods for disease association search exploiting greedy strategy and topological properties of case-control studies have been developed. New disease susceptibility prediction methods based on the developed search methods have been validated on datasets from case-control studies for several common diseases. Our experiments compare favorably the proposed algorithms with the existing association search and susceptibility prediction methods

CiteSeerX

ScholarWorks @ Georgia State University

High performance computing for haplotyping: Models and platforms

Author: A Bracciali
A Rhoads
C Luo
D Maisto
D Sims
ES Lander
F Rodriguez
HJ Greenberg
J Hermisson
JC Na
K Zhang
KE McElroy
L Bianchi
L Rundo
M Jain
M Jain
M Patterson
MA Quail
MJ Daly
MW Nachman
O Delaneau
P Edge
PR Loh
R Wang
RJ Roberts
S Benedettini
S Das
S Levy
S Sheehan
SB Gabriel
SP Otto
SR Browning
TC Wang
V Bansal
V Kuleshov
V Kuleshov
Y Choi
Y Pirola
ZZ Chen
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2019
Field of study

\u3cp\u3eThe reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.\u3c/p\u3

Repository TU/e

Crossref

Apollo (Cambridge)

Statistical physics methods in computational biology

Author: Zagordi Osvaldo
Publication venue: place:Trieste
Publication date: 03/07/2007
Field of study

The interest of statistical physics for combinatorial optimization is not new, it suffices to think of a famous tool as simulated annealing. Recently, it has also resorted to statistical inference to address some "hard" optimization problems, developing a new class of message passing algorithms. Three applications to computational biology are presented in this thesis, namely: 1) Boolean networks, a model for gene regulatory networks; 2) haplotype inference, to study the genetic information present in a population; 3) clustering, a general machine learning tool

Sissa Digital Library

Algorithms for Computational Genetics Epidemiology

Author: He Jingwu
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2006
Field of study

The most intriguing problems in genetics epidemiology are to predict genetic disease susceptibility and to associate single nucleotide polymorphisms (SNPs) with diseases. In such these studies, it is necessary to resolve the ambiguities in genetic data. The primary obstacle for ambiguity resolution is that the physical methods for separating two haplotypes from an individual genotype (phasing) are too expensive. Although computational haplotype inference is a well-explored problem, high error rates continue to deteriorate association accuracy. Secondly, it is essential to use a small subset of informative SNPs (tag SNPs) accurately representing the rest of the SNPs (tagging). Tagging can achieve budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs. Recent successes in high throughput genotyping technologies drastically increase the length of available SNP sequences. This elevates importance of informative SNP selection for compaction of huge genetic data in order to make feasible fine genotype analysis. Finally, even if complete and accurate data is available, it is unclear if common statistical methods can determine the susceptibility of complex diseases. The dissertation explores above computational problems with a variety of methods, including linear algebra, graph theory, linear programming, and greedy methods. The contributions include (1)significant speed-up of popular phasing tools without compromising their quality, (2)stat-of-the-art tagging tools applied to disease association, and (3)graph-based method for disease tagging and predicting disease susceptibility

CiteSeerX

ScholarWorks @ Georgia State University