13,940 research outputs found

    Lower Bounds for Oblivious Near-Neighbor Search

    Get PDF
    We prove an Ω(dlgn/(lglgn)2)\Omega(d \lg n/ (\lg\lg n)^2) lower bound on the dynamic cell-probe complexity of statistically oblivious\mathit{oblivious} approximate-near-neighbor search (ANN\mathsf{ANN}) over the dd-dimensional Hamming cube. For the natural setting of d=Θ(logn)d = \Theta(\log n), our result implies an Ω~(lg2n)\tilde{\Omega}(\lg^2 n) lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for ANN\mathsf{ANN}. This is the first super-logarithmic unconditional\mathit{unconditional} lower bound for ANN\mathsf{ANN} against general (non black-box) data structures. We also show that any oblivious static\mathit{static} data structure for decomposable search problems (like ANN\mathsf{ANN}) can be obliviously dynamized with O(logn)O(\log n) overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page

    Information-based complexity, feedback and dynamics in convex programming

    Get PDF
    We study the intrinsic limitations of sequential convex optimization through the lens of feedback information theory. In the oracle model of optimization, an algorithm queries an {\em oracle} for noisy information about the unknown objective function, and the goal is to (approximately) minimize every function in a given class using as few queries as possible. We show that, in order for a function to be optimized, the algorithm must be able to accumulate enough information about the objective. This, in turn, puts limits on the speed of optimization under specific assumptions on the oracle and the type of feedback. Our techniques are akin to the ones used in statistical literature to obtain minimax lower bounds on the risks of estimation procedures; the notable difference is that, unlike in the case of i.i.d. data, a sequential optimization algorithm can gather observations in a {\em controlled} manner, so that the amount of information at each step is allowed to change in time. In particular, we show that optimization algorithms often obey the law of diminishing returns: the signal-to-noise ratio drops as the optimization algorithm approaches the optimum. To underscore the generality of the tools, we use our approach to derive fundamental lower bounds for a certain active learning problem. Overall, the present work connects the intuitive notions of information in optimization, experimental design, estimation, and active learning to the quantitative notion of Shannon information.Comment: final version; to appear in IEEE Transactions on Information Theor

    Privately Releasing Conjunctions and the Statistical Query Barrier

    Full text link
    Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to the first algorithm that efficiently releases differentially private answers to of all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms

    Bloom Filters in Adversarial Environments

    Get PDF
    Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set SS of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any xSx \in S it should always answer `Yes', and for any xSx \notin S it should answer `Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size nn and error ε\varepsilon, that is secure against tt queries and uses only O(nlog1ε+t)O(n \log{\frac{1}{\varepsilon}}+t) bits of memory. In comparison, nlog1εn\log{\frac{1}{\varepsilon}} is the best possible under a non-adaptive adversary

    Indexability, concentration, and VC theory

    Get PDF
    Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated. We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded, improved and corrected version of the SISAP'2010 invited paper, this e-print, v3
    corecore