3,670 research outputs found

    Succinct Filters for Sets of Unknown Sizes

    Get PDF
    The membership problem asks to maintain a set S ? [u], supporting insertions and membership queries, i.e., testing if a given element is in the set. A data structure that computes exact answers is called a dictionary. When a (small) false positive rate ? is allowed, the data structure is called a filter. The space usages of the standard dictionaries or filters usually depend on the upper bound on the size of S, while the actual set can be much smaller. Pagh, Segev and Wieder [Pagh et al., 2013] were the first to study filters with varying space usage based on the current |S|. They showed in order to match the space with the current set size n = |S|, any filter data structure must use (1-o(1))n(log(1/?)+(1-O(?))log log n) bits, in contrast to the well-known lower bound of N log(1/?) bits, where N is an upper bound on |S|. They also presented a data structure with almost optimal space of (1+o(1))n(log(1/?)+O(log log n)) bits provided that n > u^0.001, with expected amortized constant insertion time and worst-case constant lookup time. In this work, we present a filter data structure with improvements in two aspects: - it has constant worst-case time for all insertions and lookups with high probability; - it uses space (1+o(1))n(log (1/?)+log log n) bits when n > u^0.001, achieving optimal leading constant for all ? = o(1). We also present a dictionary that uses (1+o(1))nlog(u/n) bits of space, matching the optimal space in terms of the current size, and performs all operations in constant time with high probability

    Learning detectors quickly using structured covariance matrices

    Full text link
    Computer vision is increasingly becoming interested in the rapid estimation of object detectors. Canonical hard negative mining strategies are slow as they require multiple passes of the large negative training set. Recent work has demonstrated that if the distribution of negative examples is assumed to be stationary, then Linear Discriminant Analysis (LDA) can learn comparable detectors without ever revisiting the negative set. Even with this insight, however, the time to learn a single object detector can still be on the order of tens of seconds on a modern desktop computer. This paper proposes to leverage the resulting structured covariance matrix to obtain detectors with identical performance in orders of magnitude less time and memory. We elucidate an important connection to the correlation filter literature, demonstrating that these can also be trained without ever revisiting the negative set

    Bayesian wavelet de-noising with the caravan prior

    Get PDF
    According to both domain expert knowledge and empirical evidence, wavelet coefficients of real signals tend to exhibit clustering patterns, in that they contain connected regions of coefficients of similar magnitude (large or small). A wavelet de-noising approach that takes into account such a feature of the signal may in practice outperform other, more vanilla methods, both in terms of the estimation error and visual appearance of the estimates. Motivated by this observation, we present a Bayesian approach to wavelet de-noising, where dependencies between neighbouring wavelet coefficients are a priori modelled via a Markov chain-based prior, that we term the caravan prior. Posterior computations in our method are performed via the Gibbs sampler. Using representative synthetic and real data examples, we conduct a detailed comparison of our approach with a benchmark empirical Bayes de-noising method (due to Johnstone and Silverman). We show that the caravan prior fares well and is therefore a useful addition to the wavelet de-noising toolbox.Comment: 32 pages, 15 figures, 4 table

    A Hash Table Without Hash Functions, and How to Get the Most Out of Your Random Bits

    Full text link
    This paper considers the basic question of how strong of a probabilistic guarantee can a hash table, storing nn (1+Θ(1))logn(1 + \Theta(1)) \log n-bit key/value pairs, offer? Past work on this question has been bottlenecked by limitations of the known families of hash functions: The only hash tables to achieve failure probabilities less than 1 / 2^{\polylog n} require access to fully-random hash functions -- if the same hash tables are implemented using the known explicit families of hash functions, their failure probabilities become 1 / \poly(n). To get around these obstacles, we show how to construct a randomized data structure that has the same guarantees as a hash table, but that \emph{avoids the direct use of hash functions}. Building on this, we are able to construct a hash table using O(n)O(n) random bits that achieves failure probability 1/nn1ϵ1 / n^{n^{1 - \epsilon}} for an arbitrary positive constant ϵ\epsilon. In fact, we show that this guarantee can even be achieved by a \emph{succinct dictionary}, that is, by a dictionary that uses space within a 1+o(1)1 + o(1) factor of the information-theoretic optimum. Finally we also construct a succinct hash table whose probabilistic guarantees fall on a different extreme, offering a failure probability of 1 / \poly(n) while using only O~(logn)\tilde{O}(\log n) random bits. This latter result matches (up to low-order terms) a guarantee previously achieved by Dietzfelbinger et al., but with increased space efficiency and with several surprising technical components