3,978 research outputs found
Dynamic Ordered Sets with Exponential Search Trees
We introduce exponential search trees as a novel technique for converting
static polynomial space search structures for ordered sets into fully-dynamic
linear space data structures.
This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and
updating a dynamic set of n integer keys in linear space. Here searching an
integer y means finding the maximum key in the set which is smaller than or
equal to y. This problem is equivalent to the standard text book problem of
maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein:
Introduction to Algorithms, 2nd ed., MIT Press, 2001).
The best previous deterministic linear space bound was O(log n/loglog n) due
Fredman and Willard from STOC 1990. No better deterministic search bound was
known using polynomial space.
We also get the following worst-case linear space trade-offs between the
number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log
n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however,
not likely to be optimal.
Our results are generalized to finger searching and string searching,
providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for
applications in subsequent paper
New Guarantees for Blind Compressed Sensing
Blind Compressed Sensing (BCS) is an extension of Compressed Sensing (CS)
where the optimal sparsifying dictionary is assumed to be unknown and subject
to estimation (in addition to the CS sparse coefficients). Since the emergence
of BCS, dictionary learning, a.k.a. sparse coding, has been studied as a matrix
factorization problem where its sample complexity, uniqueness and
identifiability have been addressed thoroughly. However, in spite of the strong
connections between BCS and sparse coding, recent results from the sparse
coding problem area have not been exploited within the context of BCS. In
particular, prior BCS efforts have focused on learning constrained and complete
dictionaries that limit the scope and utility of these efforts. In this paper,
we develop new theoretical bounds for perfect recovery for the general
unconstrained BCS problem. These unconstrained BCS bounds cover the case of
overcomplete dictionaries, and hence, they go well beyond the existing BCS
theory. Our perfect recovery results integrate the combinatorial theories of
sparse coding with some of the recent results from low-rank matrix recovery. In
particular, we propose an efficient CS measurement scheme that results in
practical recovery bounds for BCS. Moreover, we discuss the performance of BCS
under polynomial-time sparse coding algorithms.Comment: To appear in the 53rd Annual Allerton Conference on Communication,
Control and Computing, University of Illinois at Urbana-Champaign, IL, USA,
201
Pattern Matching in Multiple Streams
We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal is to use as little space as possible
while still reporting matches quickly. We give almost matching upper and lower
space bounds for three distinct pattern matching problems. For exact matching
we show that the problem can be solved in constant time per arriving symbol and
O(m+s) words of space. For the k-mismatch and k-difference problems we give
O(k) time solutions that require O(m+ks) words of space. In all three cases we
also give space lower bounds which show our methods are optimal up to a single
logarithmic factor. Finally we set out a number of open problems related to
this new model for pattern matching.Comment: 13 pages, 1 figur
A practical index for approximate dictionary matching with few mismatches
Approximate dictionary matching is a classic string matching problem
(checking if a query string occurs in a collection of strings) with
applications in, e.g., spellchecking, online catalogs, geolocation, and web
searchers. We present a surprisingly simple solution called a split index,
which is based on the Dirichlet principle, for matching a keyword with few
mismatches, and experimentally show that it offers competitive space-time
tradeoffs. Our implementation in the C++ language is focused mostly on data
compaction, which is beneficial for the search speed (e.g., by being cache
friendly). We compare our solution with other algorithms and we show that it
performs better for the Hamming distance. Query times in the order of 1
microsecond were reported for one mismatch for the dictionary size of a few
megabytes on a medium-end PC. We also demonstrate that a basic compression
technique consisting in -gram substitution can significantly reduce the
index size (up to 50% of the input text size for the DNA), while still keeping
the query time relatively low
- …