3,065 research outputs found
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
Testing List H-Homomorphisms
Let be an undirected graph. In the List -Homomorphism Problem, given
an undirected graph with a list constraint for each
variable , the objective is to find a list -homomorphism , that is, for every and whenever .
We consider the following problem: given a map as an oracle
access, the objective is to decide with high probability whether is a list
-homomorphism or \textit{far} from any list -homomorphisms. The
efficiency of an algorithm is measured by the number of accesses to .
In this paper, we classify graphs with respect to the query complexity
for testing list -homomorphisms and show the following trichotomy holds: (i)
List -homomorphisms are testable with a constant number of queries if and
only if is a reflexive complete graph or an irreflexive complete bipartite
graph. (ii) List -homomorphisms are testable with a sublinear number of
queries if and only if is a bi-arc graph. (iii) Testing list
-homomorphisms requires a linear number of queries if is not a bi-arc
graph
Differentially Private Release and Learning of Threshold Functions
We prove new upper and lower bounds on the sample complexity of differentially private algorithms for releasing approximate answers to
threshold functions. A threshold function over a totally ordered domain
evaluates to if , and evaluates to otherwise. We
give the first nontrivial lower bound for releasing thresholds with
differential privacy, showing that the task is impossible
over an infinite domain , and moreover requires sample complexity , which grows with the size of the domain. Inspired by the
techniques used to prove this lower bound, we give an algorithm for releasing
thresholds with samples. This improves the
previous best upper bound of (Beimel et al., RANDOM
'13).
Our sample complexity upper and lower bounds also apply to the tasks of
learning distributions with respect to Kolmogorov distance and of properly PAC
learning thresholds with differential privacy. The lower bound gives the first
separation between the sample complexity of properly learning a concept class
with differential privacy and learning without privacy. For
properly learning thresholds in dimensions, this lower bound extends to
.
To obtain our results, we give reductions in both directions from releasing
and properly learning thresholds and the simpler interior point problem. Given
a database of elements from , the interior point problem asks for an
element between the smallest and largest elements in . We introduce new
recursive constructions for bounding the sample complexity of the interior
point problem, as well as further reductions and techniques for proving
impossibility results for other basic problems in differential privacy.Comment: 43 page
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
Lower Bounds on Query Complexity for Testing Bounded-Degree CSPs
In this paper, we consider lower bounds on the query complexity for testing
CSPs in the bounded-degree model.
First, for any ``symmetric'' predicate except \equ
where , we show that every (randomized) algorithm that distinguishes
satisfiable instances of CSP(P) from instances -far
from satisfiability requires queries where is the
number of variables and is a constant that depends on and
. This breaks a natural lower bound , which is
obtained by the birthday paradox. We also show that every one-sided error
tester requires queries for such . These results are hereditary
in the sense that the same results hold for any predicate such that
. For EQU, we give a one-sided error tester
whose query complexity is . Also, for 2-XOR (or,
equivalently E2LIN2), we show an lower bound for
distinguishing instances between -close to and -far
from satisfiability.
Next, for the general k-CSP over the binary domain, we show that every
algorithm that distinguishes satisfiable instances from instances
-far from satisfiability requires queries. The
matching NP-hardness is not known, even assuming the Unique Games Conjecture or
the -to- Conjecture. As a corollary, for Maximum Independent Set on
graphs with vertices and a degree bound , we show that every
approximation algorithm within a factor d/\poly\log d and an additive error
of requires queries. Previously, only super-constant
lower bounds were known
Preventing False Discovery in Interactive Data Analysis is Hard
We show that, under a standard hardness assumption, there is no
computationally efficient algorithm that given samples from an unknown
distribution can give valid answers to adaptively chosen
statistical queries. A statistical query asks for the expectation of a
predicate over the underlying distribution, and an answer to a statistical
query is valid if it is "close" to the correct expectation over the
distribution.
Our result stands in stark contrast to the well known fact that exponentially
many statistical queries can be answered validly and efficiently if the queries
are chosen non-adaptively (no query may depend on the answers to previous
queries). Moreover, a recent work by Dwork et al. shows how to accurately
answer exponentially many adaptively chosen statistical queries via a
computationally inefficient algorithm; and how to answer a quadratic number of
adaptive queries via a computationally efficient algorithm. The latter result
implies that our result is tight up to a linear factor in
Conceptually, our result demonstrates that achieving statistical validity
alone can be a source of computational intractability in adaptive settings. For
example, in the modern large collaborative research environment, data analysts
typically choose a particular approach based on previous findings. False
discovery occurs if a research finding is supported by the data but not by the
underlying distribution. While the study of preventing false discovery in
Statistics is decades old, to the best of our knowledge our result is the first
to demonstrate a computational barrier. In particular, our result suggests that
the perceived difficulty of preventing false discovery in today's collaborative
research environment may be inherent
Limitations of semidefinite programs for separable states and entangled games
Semidefinite programs (SDPs) are a framework for exact or approximate
optimization that have widespread application in quantum information theory. We
introduce a new method for using reductions to construct integrality gaps for
SDPs. These are based on new limitations on the sum-of-squares (SoS) hierarchy
in approximating two particularly important sets in quantum information theory,
where previously no -round integrality gaps were known: the set of
separable (i.e. unentangled) states, or equivalently, the
norm of a matrix, and the set of quantum correlations; i.e. conditional
probability distributions achievable with local measurements on a shared
entangled state. In both cases no-go theorems were previously known based on
computational assumptions such as the Exponential Time Hypothesis (ETH) which
asserts that 3-SAT requires exponential time to solve. Our unconditional
results achieve the same parameters as all of these previous results (for
separable states) or as some of the previous results (for quantum
correlations). In some cases we can make use of the framework of
Lee-Raghavendra-Steurer (LRS) to establish integrality gaps for any SDP, not
only the SoS hierarchy. Our hardness result on separable states also yields a
dimension lower bound of approximate disentanglers, answering a question of
Watrous and Aaronson et al. These results can be viewed as limitations on the
monogamy principle, the PPT test, the ability of Tsirelson-type bounds to
restrict quantum correlations, as well as the SDP hierarchies of
Doherty-Parrilo-Spedalieri, Navascues-Pironio-Acin and Berta-Fawzi-Scholz.Comment: 47 pages. v2. small changes, fixes and clarifications. published
versio
- …