Search CORE

989 research outputs found

Quartet consistency count method for reconstructing phylogenetic trees

Author: Cho Jin-Hwan
Joe Dosang
Kim Young Rock
Publication venue
Publication date: 11/10/2006
Field of study

Among the distance based algorithms in phylogenetic tree reconstruction, the neighbor-joining algorithm has been a widely used and effective method. We propose a new algorithm which counts the number of consistent quartets for cherry picking with tie breaking. We show that the success rate of the new algorithm is almost equal to that of neighbor-joining. This gives an explanation of the qualitative nature of neighbor-joining and that of dissimilarity maps from DNA sequence data. Moreover, the new algorithm always reconstructs correct trees from quartet consistent dissimilarity maps.Comment: 11 pages, 5 figure

arXiv.org e-Print Archive

Optimizing Phylogenetic Supertrees Using Answer Set Programming

Author: Janhunen Tomi
Koponen Laura
Oikarinen Emilia
Säilä Laura
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2015
Field of study

The supertree construction problem is about combining several phylogenetic trees with possibly conflicting information into a single tree that has all the leaves of the source trees as its leaves and the relationships between the leaves are as consistent with the source trees as possible. This leads to an optimization problem that is computationally challenging and typically heuristic methods, such as matrix representation with parsimony (MRP), are used. In this paper we consider the use of answer set programming to solve the supertree construction problem in terms of two alternative encodings. The first is based on an existing encoding of trees using substructures known as quartets, while the other novel encoding captures the relationships present in trees through direct projections. We use these encodings to compute a genus-level supertree for the family of cats (Felidae). Furthermore, we compare our results to recent supertrees obtained by the MRP method.Comment: To appear in Theory and Practice of Logic Programming (TPLP), Proceedings of ICLP 201

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Active Learning of Multiple Source Multiple Destination Topologies

Author: Animashree An
Athina Markopoulou
Maciej Kurant
Michael Rabbat
Pegah Sattari
Senior Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2013
Field of study

We consider the problem of inferring the topology of a network with

M

sources and

N

receivers (hereafter referred to as an

M

-by-

N

network), by sending probes between the sources and receivers. Prior work has shown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e.,

1

-by-

N

's or

2

-by-

2

's) and then merge these components to identify the

M

-by-

N

topology. In this paper, we focus on the second part, which had previously received less attention in the literature. In particular, we assume that a

1

-by-

N

topology is given and that all

2

-by-

2

components can be queried and learned using end-to-end probes. The problem is which

2

-by-

2

's to query and how to merge them with the given

1

-by-

N

, so as to exactly identify the

2

-by-

N

topology, and optimize a number of performance metrics, including the number of queries (which directly translates into measurement bandwidth), time complexity, and memory usage. We provide a lower bound,

\lceil \frac{N}{2} \rceil

, on the number of

2

-by-

2

's required by any active learning algorithm and propose two greedy algorithms. The first algorithm follows the framework of multiple hypothesis testing, in particular Generalized Binary Search (GBS), since our problem is one of active learning, from

2

-by-

2

queries. The second algorithm is called the Receiver Elimination Algorithm (REA) and follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding

2

-by-

2

, and merges it with the given

1

-by-

N

; it requires exactly

N-1

steps, which is much less than all

\binom{N}{2}

possible

2

-by-

2

's. Simulation results over synthetic and realistic topologies demonstrate that both algorithms correctly identify the

2

-by-

N

topology and are near-optimal, but REA is more efficient in practice

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Caltech Authors

A New Quartet Tree Heuristic for Hierarchical Clustering

Author: Cilibrasi Rudi
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2006
Field of study

We consider the problem of constructing an an optimal-weight tree from the 3*(n choose 4) weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologiesis optimal (so it can be the case that the optimal tree embeds all quartets as non-optimal topologies). We present a heuristic for reconstructing the optimal-weight tree, and a canonical manner to derive the quartet-topology weights from a given distance matrix. The method repeatedly transforms a bifurcating tree, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. This contrasts to other heuristic search methods from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly, incrementally construct a solution from a random order of objects, and subsequently add agreement values.Comment: 22 pages, 14 figure

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

A Fast Quartet Tree Heuristic for Hierarchical Clustering

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 12/09/2014
Field of study

The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the

3{n \choose 4}

weighted quartet topologies on

n

objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized. Keywords: Data and knowledge visualization, Pattern matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering, Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with arXiv:cs/0606048 in cs.D

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository