6,514 research outputs found
Wavelet Trees Meet Suffix Trees
We present an improved wavelet tree construction algorithm and discuss its
applications to a number of rank/select problems for integer keys and strings.
Given a string of length n over an alphabet of size , our
method builds the wavelet tree in time,
improving upon the state-of-the-art algorithm by a factor of .
As a consequence, given an array of n integers we can construct in time a data structure consisting of machine words and
capable of answering rank/select queries for the subranges of the array in
time. This is a -factor improvement in
query time compared to Chan and P\u{a}tra\c{s}cu and a -factor
improvement in construction time compared to Brodal et al.
Next, we switch to stringological context and propose a novel notion of
wavelet suffix trees. For a string w of length n, this data structure occupies
words, takes time to construct, and simultaneously
captures the combinatorial structure of substrings of w while enabling
efficient top-down traversal and binary search. In particular, with a wavelet
suffix tree we are able to answer in time the following two
natural analogues of rank/select queries for suffixes of substrings: for
substrings x and y of w count the number of suffixes of x that are
lexicographically smaller than y, and for a substring x of w and an integer k,
find the k-th lexicographically smallest suffix of x.
We further show that wavelet suffix trees allow to compute a
run-length-encoded Burrows-Wheeler transform of a substring x of w in time, where s denotes the length of the resulting run-length encoding.
This answers a question by Cormode and Muthukrishnan, who considered an
analogous problem for Lempel-Ziv compression.Comment: 33 pages, 5 figures; preliminary version published at SODA 201
Pattern Matching for sets of segments
In this paper we present algorithms for a number of problems in geometric
pattern matching where the input consist of a collections of segments in the
plane. Our work consists of two main parts. In the first, we address problems
and measures that relate to collections of orthogonal line segments in the
plane. Such collections arise naturally from problems in mapping buildings and
robot exploration.
We propose a new measure of segment similarity called a \emph{coverage
measure}, and present efficient algorithms for maximising this measure between
sets of axis-parallel segments under translations. Our algorithms run in time
O(n^3\polylog n) in the general case, and run in time O(n^2\polylog n) for
the case when all segments are horizontal. In addition, we show that when
restricted to translations that are only vertical, the Hausdorff distance
between two sets of horizontal segments can be computed in time roughly
O(n^{3/2}{\sl polylog}n). These algorithms form significant improvements over
the general algorithm of Chew et al. that takes time . In the
second part of this paper we address the problem of matching polygonal chains.
We study the well known \Frd, and present the first algorithm for computing the
\Frd under general translations. Our methods also yield algorithms for
computing a generalization of the \Fr distance, and we also present a simple
approximation algorithm for the \Frd that runs in time O(n^2\polylog n).Comment: To appear in the 12 ACM Symposium on Discrete Algorithms, Jan 200
S-AMP for Non-linear Observation Models
Recently we extended Approximate message passing (AMP) algorithm to be able
to handle general invariant matrix ensembles. In this contribution we extend
our S-AMP approach to non-linear observation models. We obtain generalized AMP
(GAMP) algorithm as the special case when the measurement matrix has zero-mean
iid Gaussian entries. Our derivation is based upon 1) deriving expectation
propagation (EP) like algorithms from the stationary-points equations of the
Gibbs free energy under first- and second-moment constraints and 2) applying
additive free convolution in free probability theory to get low-complexity
updates for the second moment quantities.Comment: 6 page
Faster Fully-Dynamic Minimum Spanning Forest
We give a new data structure for the fully-dynamic minimum spanning forest
problem in simple graphs. Edge updates are supported in
amortized time per operation, improving the amortized bound of
Holm et al. (STOC'98, JACM'01). We assume the Word-RAM model with standard
instructions.Comment: 13 pages, 2 figure
Efficient estimation of AUC in a sliding window
In many applications, monitoring area under the ROC curve (AUC) in a sliding
window over a data stream is a natural way of detecting changes in the system.
The drawback is that computing AUC in a sliding window is expensive, especially
if the window size is large and the data flow is significant.
In this paper we propose a scheme for maintaining an approximate AUC in a
sliding window of length . More specifically, we propose an algorithm that,
given , estimates AUC within , and can maintain this
estimate in time, per update, as the window slides.
This provides a speed-up over the exact computation of AUC, which requires
time, per update. The speed-up becomes more significant as the size of
the window increases. Our estimate is based on grouping the data points
together, and using these groups to calculate AUC. The grouping is designed
carefully such that () the groups are small enough, so that the error stays
small, () the number of groups is small, so that enumerating them is not
expensive, and () the definition is flexible enough so that we can
maintain the groups efficiently.
Our experimental evaluation demonstrates that the average approximation error
in practice is much smaller than the approximation guarantee ,
and that we can achieve significant speed-ups with only a modest sacrifice in
accuracy
Communication Complexity and Secure Function Evaluation
We suggest two new methodologies for the design of efficient secure
protocols, that differ with respect to their underlying computational models.
In one methodology we utilize the communication complexity tree (or branching
for f and transform it into a secure protocol. In other words, "any function f
that can be computed using communication complexity c can be can be computed
securely using communication complexity that is polynomial in c and a security
parameter". The second methodology uses the circuit computing f, enhanced with
look-up tables as its underlying computational model. It is possible to
simulate any RAM machine in this model with polylogarithmic blowup. Hence it is
possible to start with a computation of f on a RAM machine and transform it
into a secure protocol.
We show many applications of these new methodologies resulting in protocols
efficient either in communication or in computation. In particular, we
exemplify a protocol for the "millionaires problem", where two participants
want to compare their values but reveal no other information. Our protocol is
more efficient than previously known ones in either communication or
computation
- …