594 research outputs found
Active Learning of Multiple Source Multiple Destination Topologies
We consider the problem of inferring the topology of a network with
sources and receivers (hereafter referred to as an -by- network), by
sending probes between the sources and receivers. Prior work has shown that
this problem can be decomposed into two parts: first, infer smaller subnetwork
components (i.e., -by-'s or -by-'s) and then merge these components
to identify the -by- topology. In this paper, we focus on the second
part, which had previously received less attention in the literature. In
particular, we assume that a -by- topology is given and that all
-by- components can be queried and learned using end-to-end probes. The
problem is which -by-'s to query and how to merge them with the given
-by-, so as to exactly identify the -by- topology, and optimize a
number of performance metrics, including the number of queries (which directly
translates into measurement bandwidth), time complexity, and memory usage. We
provide a lower bound, , on the number of
-by-'s required by any active learning algorithm and propose two greedy
algorithms. The first algorithm follows the framework of multiple hypothesis
testing, in particular Generalized Binary Search (GBS), since our problem is
one of active learning, from -by- queries. The second algorithm is called
the Receiver Elimination Algorithm (REA) and follows a bottom-up approach: at
every step, it selects two receivers, queries the corresponding -by-, and
merges it with the given -by-; it requires exactly steps, which is
much less than all possible -by-'s. Simulation results
over synthetic and realistic topologies demonstrate that both algorithms
correctly identify the -by- topology and are near-optimal, but REA is
more efficient in practice
Uniqueness, intractability and exact algorithms: reflections on level-k phylogenetic networks
Phylogenetic networks provide a way to describe and visualize evolutionary
histories that have undergone so-called reticulate evolutionary events such as
recombination, hybridization or horizontal gene transfer. The level k of a
network determines how non-treelike the evolution can be, with level-0 networks
being trees. We study the problem of constructing level-k phylogenetic networks
from triplets, i.e. phylogenetic trees for three leaves (taxa). We give, for
each k, a level-k network that is uniquely defined by its triplets. We
demonstrate the applicability of this result by using it to prove that (1) for
all k of at least one it is NP-hard to construct a level-k network consistent
with all input triplets, and (2) for all k it is NP-hard to construct a level-k
network consistent with a maximum number of input triplets, even when the input
is dense. As a response to this intractability we give an exact algorithm for
constructing level-1 networks consistent with a maximum number of input
triplets
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
A New Quartet Tree Heuristic for Hierarchical Clustering
We consider the problem of constructing an an optimal-weight tree from the
3*(n choose 4) weighted quartet topologies on n objects, where optimality means
that the summed weight of the embedded quartet topologiesis optimal (so it can
be the case that the optimal tree embeds all quartets as non-optimal
topologies). We present a heuristic for reconstructing the optimal-weight tree,
and a canonical manner to derive the quartet-topology weights from a given
distance matrix. The method repeatedly transforms a bifurcating tree, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. This contrasts to other heuristic search methods
from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly,
incrementally construct a solution from a random order of objects, and
subsequently add agreement values.Comment: 22 pages, 14 figure
Optimizing Phylogenetic Supertrees Using Answer Set Programming
The supertree construction problem is about combining several phylogenetic
trees with possibly conflicting information into a single tree that has all the
leaves of the source trees as its leaves and the relationships between the
leaves are as consistent with the source trees as possible. This leads to an
optimization problem that is computationally challenging and typically
heuristic methods, such as matrix representation with parsimony (MRP), are
used. In this paper we consider the use of answer set programming to solve the
supertree construction problem in terms of two alternative encodings. The first
is based on an existing encoding of trees using substructures known as
quartets, while the other novel encoding captures the relationships present in
trees through direct projections. We use these encodings to compute a
genus-level supertree for the family of cats (Felidae). Furthermore, we compare
our results to recent supertrees obtained by the MRP method.Comment: To appear in Theory and Practice of Logic Programming (TPLP),
Proceedings of ICLP 201
- …