17,445 research outputs found
Weighted ancestors in suffix trees
The classical, ubiquitous, predecessor problem is to construct a data
structure for a set of integers that supports fast predecessor queries. Its
generalization to weighted trees, a.k.a. the weighted ancestor problem, has
been extensively explored and successfully reduced to the predecessor problem.
It is known that any solution for both problems with an input set from a
polynomially bounded universe that preprocesses a weighted tree in O(n
polylog(n)) space requires \Omega(loglogn) query time. Perhaps the most
important and frequent application of the weighted ancestors problem is for
suffix trees. It has been a long-standing open question whether the weighted
ancestors problem has better bounds for suffix trees. We answer this question
positively: we show that a suffix tree built for a text w[1..n] can be
preprocessed using O(n) extra space, so that queries can be answered in O(1)
time. Thus we improve the running times of several applications. Our
improvement is based on a number of data structure tools and a
periodicity-based insight into the combinatorial structure of a suffix tree.Comment: 27 pages, LNCS format. A condensed version will appear in ESA 201
Distance labeling schemes for trees
We consider distance labeling schemes for trees: given a tree with nodes,
label the nodes with binary strings such that, given the labels of any two
nodes, one can determine, by looking only at the labels, the distance in the
tree between the two nodes.
A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg
(J. Graph Theory 2000) establish that labels must use
bits\footnote{Throughout this paper we use for .}. Gavoille et.
al. (ESA 2001) show that for very small approximate stretch, labels use
bits. Several other papers investigate various
variants such as, for example, small distances in trees (Alstrup et. al.,
SODA'03).
We improve the known upper and lower bounds of exact distance labeling by
showing that bits are needed and that bits are sufficient. We also give ()-stretch labeling
schemes using bits for constant .
()-stretch labeling schemes with polylogarithmic label size have
previously been established for doubling dimension graphs by Talwar (STOC
2004).
In addition, we present matching upper and lower bounds for distance labeling
for caterpillars, showing that labels must have size . For simple paths with nodes and edge weights in , we show that
labels must have size
Minimum Cuts in Near-Linear Time
We significantly improve known time bounds for solving the minimum cut
problem on undirected graphs. We use a ``semi-duality'' between minimum cuts
and maximum spanning tree packings combined with our previously developed
random sampling techniques. We give a randomized algorithm that finds a minimum
cut in an m-edge, n-vertex graph with high probability in O(m log^3 n) time. We
also give a simpler randomized algorithm that finds all minimum cuts with high
probability in O(n^2 log n) time. This variant has an optimal RNC
parallelization. Both variants improve on the previous best time bound of O(n^2
log^3 n). Other applications of the tree-packing approach are new, nearly tight
bounds on the number of near minimum cuts a graph may have and a new data
structure for representing them in a space-efficient manner
Labeling Schemes with Queries
We study the question of ``how robust are the known lower bounds of labeling
schemes when one increases the number of consulted labels''. Let be a
function on pairs of vertices. An -labeling scheme for a family of graphs
\cF labels the vertices of all graphs in \cF such that for every graph
G\in\cF and every two vertices , the value can be inferred
by merely inspecting the labels of and .
This paper introduces a natural generalization: the notion of -labeling
schemes with queries, in which the value can be inferred by inspecting
not only the labels of and but possibly the labels of some additional
vertices. We show that inspecting the label of a single additional vertex (one
{\em query}) enables us to reduce the label size of many labeling schemes
significantly
Almost-Tight Distributed Minimum Cut Algorithms
We study the problem of computing the minimum cut in a weighted distributed
message-passing networks (the CONGEST model). Let be the minimum cut,
be the number of nodes in the network, and be the network diameter. Our
algorithm can compute exactly in time. To the best of our knowledge, this is the first paper that
explicitly studies computing the exact minimum cut in the distributed setting.
Previously, non-trivial sublinear time algorithms for this problem are known
only for unweighted graphs when due to Pritchard and
Thurimella's -time and -time algorithms for
computing -edge-connected and -edge-connected components.
By using the edge sampling technique of Karger's, we can convert this
algorithm into a -approximation -time algorithm for any . This improves
over the previous -approximation -time algorithm and
-approximation -time algorithm of Ghaffari and Kuhn. Due to the lower
bound of by Das Sarma et al. which holds for any
approximation algorithm, this running time is tight up to a factor.
To get the stated running time, we developed an approximation algorithm which
combines the ideas of Thorup's algorithm and Matula's contraction algorithm. It
saves an factor as compared to applying Thorup's tree
packing theorem directly. Then, we combine Kutten and Peleg's tree partitioning
algorithm and Karger's dynamic programming to achieve an efficient distributed
algorithm that finds the minimum cut when we are given a spanning tree that
crosses the minimum cut exactly once
Heaviest Induced Ancestors and Longest Common Substrings
Suppose we have two trees on the same set of leaves, in which nodes are
weighted such that children are heavier than their parents. We say a node from
the first tree and a node from the second tree are induced together if they
have a common leaf descendant. In this paper we describe data structures that
efficiently support the following heaviest-induced-ancestor query: given a node
from the first tree and a node from the second tree, find an induced pair of
their ancestors with maximum combined weight. Our solutions are based on a
geometric interpretation that enables us to find heaviest induced ancestors
using range queries. We then show how to use these results to build an
LZ-compressed index with which we can quickly find with high probability a
longest substring common to the indexed string and a given pattern
- …