6,037 research outputs found
Preventing False Discovery in Interactive Data Analysis is Hard
We show that, under a standard hardness assumption, there is no
computationally efficient algorithm that given samples from an unknown
distribution can give valid answers to adaptively chosen
statistical queries. A statistical query asks for the expectation of a
predicate over the underlying distribution, and an answer to a statistical
query is valid if it is "close" to the correct expectation over the
distribution.
Our result stands in stark contrast to the well known fact that exponentially
many statistical queries can be answered validly and efficiently if the queries
are chosen non-adaptively (no query may depend on the answers to previous
queries). Moreover, a recent work by Dwork et al. shows how to accurately
answer exponentially many adaptively chosen statistical queries via a
computationally inefficient algorithm; and how to answer a quadratic number of
adaptive queries via a computationally efficient algorithm. The latter result
implies that our result is tight up to a linear factor in
Conceptually, our result demonstrates that achieving statistical validity
alone can be a source of computational intractability in adaptive settings. For
example, in the modern large collaborative research environment, data analysts
typically choose a particular approach based on previous findings. False
discovery occurs if a research finding is supported by the data but not by the
underlying distribution. While the study of preventing false discovery in
Statistics is decades old, to the best of our knowledge our result is the first
to demonstrate a computational barrier. In particular, our result suggests that
the perceived difficulty of preventing false discovery in today's collaborative
research environment may be inherent
Lower Bounds on the Oracle Complexity of Nonsmooth Convex Optimization via Information Theory
We present an information-theoretic approach to lower bound the oracle
complexity of nonsmooth black box convex optimization, unifying previous lower
bounding techniques by identifying a combinatorial problem, namely string
guessing, as a single source of hardness. As a measure of complexity we use
distributional oracle complexity, which subsumes randomized oracle complexity
as well as worst-case oracle complexity. We obtain strong lower bounds on
distributional oracle complexity for the box , as well as for the
-ball for (for both low-scale and large-scale regimes),
matching worst-case upper bounds, and hence we close the gap between
distributional complexity, and in particular, randomized complexity, and
worst-case complexity. Furthermore, the bounds remain essentially the same for
high-probability and bounded-error oracle complexity, and even for combination
of the two, i.e., bounded-error high-probability oracle complexity. This
considerably extends the applicability of known bounds
Quantum algorithms for testing properties of distributions
Suppose one has access to oracles generating samples from two unknown
probability distributions P and Q on some N-element set. How many samples does
one need to test whether the two distributions are close or far from each other
in the L_1-norm ? This and related questions have been extensively studied
during the last years in the field of property testing. In the present paper we
study quantum algorithms for testing properties of distributions. It is shown
that the L_1-distance between P and Q can be estimated with a constant
precision using approximately N^{1/2} queries in the quantum settings, whereas
classical computers need \Omega(N) queries. We also describe quantum algorithms
for testing Uniformity and Orthogonality with query complexity O(N^{1/3}). The
classical query complexity of these problems is known to be \Omega(N^{1/2}).Comment: 20 page
Quantum Simulation Logic, Oracles, and the Quantum Advantage
Query complexity is a common tool for comparing quantum and classical
computation, and it has produced many examples of how quantum algorithms differ
from classical ones. Here we investigate in detail the role that oracles play
for the advantage of quantum algorithms. We do so by using a simulation
framework, Quantum Simulation Logic (QSL), to construct oracles and algorithms
that solve some problems with the same success probability and number of
queries as the quantum algorithms. The framework can be simulated using only
classical resources at a constant overhead as compared to the quantum resources
used in quantum computation. Our results clarify the assumptions made and the
conditions needed when using quantum oracles. Using the same assumptions on
oracles within the simulation framework we show that for some specific
algorithms, like the Deutsch-Jozsa and Simon's algorithms, there simply is no
advantage in terms of query complexity. This does not detract from the fact
that quantum query complexity provides examples of how a quantum computer can
be expected to behave, which in turn has proved useful for finding new quantum
algorithms outside of the oracle paradigm, where the most prominent example is
Shor's algorithm for integer factorization.Comment: 48 pages, 46 figure
Zipf's law and L. Levin's probability distributions
Zipf's law in its basic incarnation is an empirical probability distribution
governing the frequency of usage of words in a language. As Terence Tao
recently remarked, it still lacks a convincing and satisfactory mathematical
explanation.
In this paper I suggest that at least in certain situations, Zipf's law can
be explained as a special case of the a priori distribution introduced and
studied by L. Levin. The Zipf ranking corresponding to diminishing probability
appears then as the ordering determined by the growing Kolmogorov complexity.
One argument justifying this assertion is the appeal to a recent
interpretation by Yu. Manin and M. Marcolli of asymptotic bounds for
error--correcting codes in terms of phase transition. In the respective
partition function, Kolmogorov complexity of a code plays the role of its
energy.
This version contains minor corrections and additions.Comment: 19 page
- …