6,037 research outputs found

    Preventing False Discovery in Interactive Data Analysis is Hard

    Full text link
    We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given nn samples from an unknown distribution can give valid answers to n3+o(1)n^{3+o(1)} adaptively chosen statistical queries. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution. Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively (no query may depend on the answers to previous queries). Moreover, a recent work by Dwork et al. shows how to accurately answer exponentially many adaptively chosen statistical queries via a computationally inefficient algorithm; and how to answer a quadratic number of adaptive queries via a computationally efficient algorithm. The latter result implies that our result is tight up to a linear factor in n.n. Conceptually, our result demonstrates that achieving statistical validity alone can be a source of computational intractability in adaptive settings. For example, in the modern large collaborative research environment, data analysts typically choose a particular approach based on previous findings. False discovery occurs if a research finding is supported by the data but not by the underlying distribution. While the study of preventing false discovery in Statistics is decades old, to the best of our knowledge our result is the first to demonstrate a computational barrier. In particular, our result suggests that the perceived difficulty of preventing false discovery in today's collaborative research environment may be inherent

    Lower Bounds on the Oracle Complexity of Nonsmooth Convex Optimization via Information Theory

    Full text link
    We present an information-theoretic approach to lower bound the oracle complexity of nonsmooth black box convex optimization, unifying previous lower bounding techniques by identifying a combinatorial problem, namely string guessing, as a single source of hardness. As a measure of complexity we use distributional oracle complexity, which subsumes randomized oracle complexity as well as worst-case oracle complexity. We obtain strong lower bounds on distributional oracle complexity for the box [−1,1]n[-1,1]^n, as well as for the LpL^p-ball for p≥1p \geq 1 (for both low-scale and large-scale regimes), matching worst-case upper bounds, and hence we close the gap between distributional complexity, and in particular, randomized complexity, and worst-case complexity. Furthermore, the bounds remain essentially the same for high-probability and bounded-error oracle complexity, and even for combination of the two, i.e., bounded-error high-probability oracle complexity. This considerably extends the applicability of known bounds

    Quantum algorithms for testing properties of distributions

    Get PDF
    Suppose one has access to oracles generating samples from two unknown probability distributions P and Q on some N-element set. How many samples does one need to test whether the two distributions are close or far from each other in the L_1-norm ? This and related questions have been extensively studied during the last years in the field of property testing. In the present paper we study quantum algorithms for testing properties of distributions. It is shown that the L_1-distance between P and Q can be estimated with a constant precision using approximately N^{1/2} queries in the quantum settings, whereas classical computers need \Omega(N) queries. We also describe quantum algorithms for testing Uniformity and Orthogonality with query complexity O(N^{1/3}). The classical query complexity of these problems is known to be \Omega(N^{1/2}).Comment: 20 page

    Quantum Simulation Logic, Oracles, and the Quantum Advantage

    Full text link
    Query complexity is a common tool for comparing quantum and classical computation, and it has produced many examples of how quantum algorithms differ from classical ones. Here we investigate in detail the role that oracles play for the advantage of quantum algorithms. We do so by using a simulation framework, Quantum Simulation Logic (QSL), to construct oracles and algorithms that solve some problems with the same success probability and number of queries as the quantum algorithms. The framework can be simulated using only classical resources at a constant overhead as compared to the quantum resources used in quantum computation. Our results clarify the assumptions made and the conditions needed when using quantum oracles. Using the same assumptions on oracles within the simulation framework we show that for some specific algorithms, like the Deutsch-Jozsa and Simon's algorithms, there simply is no advantage in terms of query complexity. This does not detract from the fact that quantum query complexity provides examples of how a quantum computer can be expected to behave, which in turn has proved useful for finding new quantum algorithms outside of the oracle paradigm, where the most prominent example is Shor's algorithm for integer factorization.Comment: 48 pages, 46 figure

    Zipf's law and L. Levin's probability distributions

    Full text link
    Zipf's law in its basic incarnation is an empirical probability distribution governing the frequency of usage of words in a language. As Terence Tao recently remarked, it still lacks a convincing and satisfactory mathematical explanation. In this paper I suggest that at least in certain situations, Zipf's law can be explained as a special case of the a priori distribution introduced and studied by L. Levin. The Zipf ranking corresponding to diminishing probability appears then as the ordering determined by the growing Kolmogorov complexity. One argument justifying this assertion is the appeal to a recent interpretation by Yu. Manin and M. Marcolli of asymptotic bounds for error--correcting codes in terms of phase transition. In the respective partition function, Kolmogorov complexity of a code plays the role of its energy. This version contains minor corrections and additions.Comment: 19 page
    • …
    corecore