Search CORE

84 research outputs found

A Column Generation Approach for Pure Parsimony Haplotyping

Author: Dal Sasso Veronica
De Giovanni Luigi
Publication venue: OASIcs - OpenAccess Series in Informatics. 5th Student Conference on Operational Research (SCOR 2016)
Publication date: 01/01/2016
Field of study

Dagstuhl Research Online Publication Server

Pure Parsimony Xor Haplotyping

Author: Bonizzoni Paola
Della Vedova Gianluca
Dondi Riccardo
Pirola Yuri
Rizzi Romeo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given SNP. Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Diversity Graphs

Author: Blain P
Davis C
Holder Allen G
Silva J
Vinzant C
Publication venue: Digital Commons @ Trinity
Publication date: 01/01/2009
Field of study

Bipartite graphs have long been used to study and model matching problems, and in this paper we introduce the bipartite graphs that explain a recent matching problem in computational biology. The problem is to match haplotypes to genotypes in a way that minimizes the number of haplotypes, a problem called the Pure Parsimony problem. The goal of this work is not to address the computational or biological issues but rather to explore the mathematical structure through a study of the underlying graph theory

Trinity University

Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

Author: Achenbach U.
Basekow R.
Diehl S.
Gebhardt C.
Gyetvai G.
Kersten B.
Neigenfind J.
Selbig J.
Publication venue
Publication date
Field of study

MPG.PuRe

Parsimony-based genetic algorithm for haplotype resolution and block partitioning

Author: Sazonova Nadezhda A.
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2007
Field of study

This dissertation proposes a new algorithm for performing simultaneous haplotype resolution and block partitioning. The algorithm is based on genetic algorithm approach and the parsimonious principle. The multiloculs LD measure (Normalized Entropy Difference) is used as a block identification criterion. The proposed algorithm incorporates missing data is a part of the model and allows blocks of arbitrary length. In addition, the algorithm provides scores for the block boundaries which represent measures of strength of the boundaries at specific positions. The performance of the proposed algorithm was validated by running it on several publicly available data sets including the HapMap data and comparing results to those of the existing state-of-the-art algorithms. The results show that the proposed genetic algorithm provides the accuracy of haplotype decomposition within the range of the same indicators shown by the other algorithms. The block structure output by our algorithm in general agrees with the block structure for the same data provided by the other algorithms. Thus, the proposed algorithm can be successfully used for block partitioning and haplotype phasing while providing some new valuable features like scores for block boundaries and fully incorporated treatment of missing data. In addition, the proposed algorithm for haplotyping and block partitioning is used in development of the new clustering algorithm for two-population mixed genotype samples. The proposed clustering algorithm extracts from the given genotype sample two clusters with substantially different block structures and finds haplotype resolution and block partitioning for each cluster

The Research Repository @ WVU (West Virginia University)

High performance computing for haplotyping: Models and platforms

Author: A Bracciali
A Rhoads
C Luo
D Maisto
D Sims
ES Lander
F Rodriguez
HJ Greenberg
J Hermisson
JC Na
K Zhang
KE McElroy
L Bianchi
L Rundo
M Jain
M Jain
M Patterson
MA Quail
MJ Daly
MW Nachman
O Delaneau
P Edge
PR Loh
R Wang
RJ Roberts
S Benedettini
S Das
S Levy
S Sheehan
SB Gabriel
SP Otto
SR Browning
TC Wang
V Bansal
V Kuleshov
V Kuleshov
Y Choi
Y Pirola
ZZ Chen
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2019
Field of study

\u3cp\u3eThe reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.\u3c/p\u3

Repository TU/e

Crossref

Apollo (Cambridge)

Direct maximum parsimony phylogeny reconstruction from genotype data

Author: Fumei Lam
Guy E Blelloch
R Ravi
Russell Schwartz
Sridhar Srinath
Srinath aiidhar
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. Results In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes. Conclusion Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Maximum parsimony xor haplotyping by sparse dictionary selection

Author: Elmas Abdulkadir
Jajamovich Guido H.
Wang Xiaodong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Background: Xor-genotype is a cost-effective alternative to the genotype sequence of an individual. Recent methods developed for haplotype inference have aimed at finding the solution based on xor-genotype data. Given the xor-genotypes of a group of unrelated individuals, it is possible to infer the haplotype pairs for each individual with the aid of a small number of regular genotypes. Results: We propose a framework of maximum parsimony inference of haplotypes based on the search of a sparse dictionary, and we present a greedy method that can effectively infer the haplotype pairs given a set of xor-genotypes augmented by a small number of regular genotypes. We test the performance of the proposed approach on synthetic data sets with different number of individuals and SNPs, and compare the performances with the state-of-the-art xor-haplotyping methods PPXH and XOR-HAPLOGEN. Conclusions: Experimental results show good inference qualities for the proposed method under all circumstances, especially on large data sets. Results on a real database, CFTR, also demonstrate significantly better performance. The proposed algorithm is also capable of finding accurate solutions with missing data and/or typing errors

Columbia University Academic Commons

Springer - Publisher Connector

PubMed Central

A Preprocessing Procedure for Haplotype Inference by Pure Parsimony

Author: Calvo Molinos Borja
Irurozki Ekhine
Lozano Alonso José Antonio
Publication venue
Publication date: 01/01/2010
Field of study

Haplotype data is especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and expensive. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved

Archivo Digital para la Docencia y la Investigación