3,670 research outputs found
Succinct Filters for Sets of Unknown Sizes
The membership problem asks to maintain a set S ? [u], supporting insertions and membership queries, i.e., testing if a given element is in the set. A data structure that computes exact answers is called a dictionary. When a (small) false positive rate ? is allowed, the data structure is called a filter.
The space usages of the standard dictionaries or filters usually depend on the upper bound on the size of S, while the actual set can be much smaller.
Pagh, Segev and Wieder [Pagh et al., 2013] were the first to study filters with varying space usage based on the current |S|. They showed in order to match the space with the current set size n = |S|, any filter data structure must use (1-o(1))n(log(1/?)+(1-O(?))log log n) bits, in contrast to the well-known lower bound of N log(1/?) bits, where N is an upper bound on |S|. They also presented a data structure with almost optimal space of (1+o(1))n(log(1/?)+O(log log n)) bits provided that n > u^0.001, with expected amortized constant insertion time and worst-case constant lookup time.
In this work, we present a filter data structure with improvements in two aspects:
- it has constant worst-case time for all insertions and lookups with high probability;
- it uses space (1+o(1))n(log (1/?)+log log n) bits when n > u^0.001, achieving optimal leading constant for all ? = o(1). We also present a dictionary that uses (1+o(1))nlog(u/n) bits of space, matching the optimal space in terms of the current size, and performs all operations in constant time with high probability
Learning detectors quickly using structured covariance matrices
Computer vision is increasingly becoming interested in the rapid estimation
of object detectors. Canonical hard negative mining strategies are slow as they
require multiple passes of the large negative training set. Recent work has
demonstrated that if the distribution of negative examples is assumed to be
stationary, then Linear Discriminant Analysis (LDA) can learn comparable
detectors without ever revisiting the negative set. Even with this insight,
however, the time to learn a single object detector can still be on the order
of tens of seconds on a modern desktop computer. This paper proposes to
leverage the resulting structured covariance matrix to obtain detectors with
identical performance in orders of magnitude less time and memory. We elucidate
an important connection to the correlation filter literature, demonstrating
that these can also be trained without ever revisiting the negative set
Bayesian wavelet de-noising with the caravan prior
According to both domain expert knowledge and empirical evidence, wavelet
coefficients of real signals tend to exhibit clustering patterns, in that they
contain connected regions of coefficients of similar magnitude (large or
small). A wavelet de-noising approach that takes into account such a feature of
the signal may in practice outperform other, more vanilla methods, both in
terms of the estimation error and visual appearance of the estimates. Motivated
by this observation, we present a Bayesian approach to wavelet de-noising,
where dependencies between neighbouring wavelet coefficients are a priori
modelled via a Markov chain-based prior, that we term the caravan prior.
Posterior computations in our method are performed via the Gibbs sampler. Using
representative synthetic and real data examples, we conduct a detailed
comparison of our approach with a benchmark empirical Bayes de-noising method
(due to Johnstone and Silverman). We show that the caravan prior fares well and
is therefore a useful addition to the wavelet de-noising toolbox.Comment: 32 pages, 15 figures, 4 table
A Hash Table Without Hash Functions, and How to Get the Most Out of Your Random Bits
This paper considers the basic question of how strong of a probabilistic
guarantee can a hash table, storing -bit key/value
pairs, offer? Past work on this question has been bottlenecked by limitations
of the known families of hash functions: The only hash tables to achieve
failure probabilities less than 1 / 2^{\polylog n} require access to
fully-random hash functions -- if the same hash tables are implemented using
the known explicit families of hash functions, their failure probabilities
become 1 / \poly(n).
To get around these obstacles, we show how to construct a randomized data
structure that has the same guarantees as a hash table, but that \emph{avoids
the direct use of hash functions}. Building on this, we are able to construct a
hash table using random bits that achieves failure probability for an arbitrary positive constant .
In fact, we show that this guarantee can even be achieved by a \emph{succinct
dictionary}, that is, by a dictionary that uses space within a
factor of the information-theoretic optimum.
Finally we also construct a succinct hash table whose probabilistic
guarantees fall on a different extreme, offering a failure probability of 1 /
\poly(n) while using only random bits. This latter result
matches (up to low-order terms) a guarantee previously achieved by
Dietzfelbinger et al., but with increased space efficiency and with several
surprising technical components
- …