3,573 research outputs found
Search for the end of a path in the d-dimensional grid and in other graphs
We consider the worst-case query complexity of some variants of certain
\cl{PPAD}-complete search problems. Suppose we are given a graph and a
vertex . We denote the directed graph obtained from by
directing all edges in both directions by . is a directed subgraph of
which is unknown to us, except that it consists of vertex-disjoint
directed paths and cycles and one of the paths originates in . Our goal is
to find an endvertex of a path by using as few queries as possible. A query
specifies a vertex , and the answer is the set of the edges of
incident to , together with their directions. We also show lower bounds for
the special case when consists of a single path. Our proofs use the theory
of graph separators. Finally, we consider the case when the graph is a grid
graph. In this case, using the connection with separators, we give
asymptotically tight bounds as a function of the size of the grid, if the
dimension of the grid is considered as fixed. In order to do this, we prove a
separator theorem about grid graphs, which is interesting on its own right
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
Agglomerative Clustering of Growing Squares
We study an agglomerative clustering problem motivated by interactive glyphs
in geo-visualization. Consider a set of disjoint square glyphs on an
interactive map. When the user zooms out, the glyphs grow in size relative to
the map, possibly with different speeds. When two glyphs intersect, we wish to
replace them by a new glyph that captures the information of the intersecting
glyphs.
We present a fully dynamic kinetic data structure that maintains a set of
disjoint growing squares. Our data structure uses
space, supports queries in worst case time, and updates in
amortized time. This leads to an time
algorithm to solve the agglomerative clustering problem. This is a significant
improvement over the current best time algorithms.Comment: 14 pages, 7 figure
High-rate self-synchronizing codes
Self-synchronization under the presence of additive noise can be achieved by
allocating a certain number of bits of each codeword as markers for
synchronization. Difference systems of sets are combinatorial designs which
specify the positions of synchronization markers in codewords in such a way
that the resulting error-tolerant self-synchronizing codes may be realized as
cosets of linear codes. Ideally, difference systems of sets should sacrifice as
few bits as possible for a given code length, alphabet size, and
error-tolerance capability. However, it seems difficult to attain optimality
with respect to known bounds when the noise level is relatively low. In fact,
the majority of known optimal difference systems of sets are for exceptionally
noisy channels, requiring a substantial amount of bits for synchronization. To
address this problem, we present constructions for difference systems of sets
that allow for higher information rates while sacrificing optimality to only a
small extent. Our constructions utilize optimal difference systems of sets as
ingredients and, when applied carefully, generate asymptotically optimal ones
with higher information rates. We also give direct constructions for optimal
difference systems of sets with high information rates and error-tolerance that
generate binary and ternary self-synchronizing codes.Comment: 9 pages, no figure, 2 tables. Final accepted version for publication
in the IEEE Transactions on Information Theory. Material presented in part at
the International Symposium on Information Theory and its Applications,
Honolulu, HI USA, October 201
A New Quartet Tree Heuristic for Hierarchical Clustering
We consider the problem of constructing an an optimal-weight tree from the
3*(n choose 4) weighted quartet topologies on n objects, where optimality means
that the summed weight of the embedded quartet topologiesis optimal (so it can
be the case that the optimal tree embeds all quartets as non-optimal
topologies). We present a heuristic for reconstructing the optimal-weight tree,
and a canonical manner to derive the quartet-topology weights from a given
distance matrix. The method repeatedly transforms a bifurcating tree, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. This contrasts to other heuristic search methods
from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly,
incrementally construct a solution from a random order of objects, and
subsequently add agreement values.Comment: 22 pages, 14 figure
- …