Search CORE

3,065 research outputs found

Testing probability distributions underlying aggregated data

Author: A. Blum
C. Dwork
C. Dwork
C.L. Canonne
L. Birgé
L. Paninski
M. Parnas
P. Valiant
R. Rubinfeld
S. Chakraborty
S.K. Ma
T. Batu
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution

D

over

[n]

. More precisely, we define both the dual and cumulative dual access models, in which the algorithm

A

can both sample from

D

and respectively, for any

i\in[n]

, - query the probability mass

D(i)

(query access); or - get the total mass of

\{1,\dots,i\}

, i.e.

\sum_{j=1}^i D(j)

(cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

arXiv.org e-Print Archive

CiteSeerX

Crossref

DSpace@MIT

Testing List H-Homomorphisms

Author: Yoshida Yuichi
Publication venue
Publication date: 01/01/2011
Field of study

Let

H

be an undirected graph. In the List

H

-Homomorphism Problem, given an undirected graph

G

with a list constraint

L(v) \subseteq V(H)

for each variable

v \in V(G)

, the objective is to find a list

H

-homomorphism

f:V(G) \to V(H)

, that is,

f(v) \in L(v)

for every

v \in V(G)

and

(f(u),f(v)) \in E(H)

whenever

(u,v) \in E(G)

. We consider the following problem: given a map

f:V(G) \to V(H)

as an oracle access, the objective is to decide with high probability whether

f

is a list

H

-homomorphism or \textit{far} from any list

H

-homomorphisms. The efficiency of an algorithm is measured by the number of accesses to

f

. In this paper, we classify graphs

H

with respect to the query complexity for testing list

H

-homomorphisms and show the following trichotomy holds: (i) List

H

-homomorphisms are testable with a constant number of queries if and only if

H

is a reflexive complete graph or an irreflexive complete bipartite graph. (ii) List

H

-homomorphisms are testable with a sublinear number of queries if and only if

H

is a bi-arc graph. (iii) Testing list

H

-homomorphisms requires a linear number of queries if

H

is not a bi-arc graph

arXiv.org e-Print Archive

CiteSeerX

Differentially Private Release and Learning of Threshold Functions

Author: Bun Mark
Nissim Kobbi
Stemmer Uri
Vadhan Salil
Publication venue
Publication date: 28/04/2015
Field of study

We prove new upper and lower bounds on the sample complexity of

(\epsilon, \delta)

differentially private algorithms for releasing approximate answers to threshold functions. A threshold function

c_x

over a totally ordered domain

X

evaluates to

c_x(y) = 1

y \le x

, and evaluates to

0

otherwise. We give the first nontrivial lower bound for releasing thresholds with

(\epsilon,\delta)

differential privacy, showing that the task is impossible over an infinite domain

X

, and moreover requires sample complexity

n \ge \Omega(\log^*|X|)

, which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with

n \le 2^{(1+ o(1))\log^*|X|}

samples. This improves the previous best upper bound of

8^{(1 + o(1))\log^*|X|}

(Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with

(\epsilon,\delta)

differential privacy and learning without privacy. For properly learning thresholds in

\ell

dimensions, this lower bound extends to

n \ge \Omega(\ell \cdot \log^*|X|)

. To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database

D

of elements from

X

, the interior point problem asks for an element between the smallest and largest elements in

D

. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.Comment: 43 page

arXiv.org e-Print Archive

Crossref

Distributed PCP Theorems for Hardness of Approximation in P

Author: Abboud Amir
Rubinstein Aviad
Williams Ryan
Publication venue
Publication date: 01/01/1952
Field of study

We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment

x \in \{0,1\}^n

to a CNF formula

\varphi

is shared between two parties, where Alice knows

x_1, \dots, x_{n/2}

, Bob knows

x_{n/2+1},\dots,x_n

, and both parties know

\varphi

. The goal is to have Alice and Bob jointly write a PCP that

x

satisfies

\varphi

, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of

x

. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of

2^{(\log n)^{1-o(1)}}

; only

(1+o(1))

-factor lower bounds (under SETH) were known before

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Crossref

Lower Bounds on Query Complexity for Testing Bounded-Degree CSPs

Author: Yoshida Yuichi
Publication venue
Publication date: 19/07/2010
Field of study

In this paper, we consider lower bounds on the query complexity for testing CSPs in the bounded-degree model. First, for any ``symmetric'' predicate

P:{0,1}^{k} \to {0,1}

except \equ where

k\geq 3

, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances

(|P^{-1}(0)|/2^k-\epsilon)

-far from satisfiability requires

\Omega(n^{1/2+\delta})

queries where

n

is the number of variables and

\delta>0

is a constant that depends on

P

and

\epsilon

. This breaks a natural lower bound

\Omega(n^{1/2})

, which is obtained by the birthday paradox. We also show that every one-sided error tester requires

\Omega(n)

queries for such

P

. These results are hereditary in the sense that the same results hold for any predicate

Q

such that

P^{-1}(1) \subseteq Q^{-1}(1)

. For EQU, we give a one-sided error tester whose query complexity is

\tilde{O}(n^{1/2})

. Also, for 2-XOR (or, equivalently E2LIN2), we show an

\Omega(n^{1/2+\delta})

lower bound for distinguishing instances between

\epsilon

-close to and

(1/2-\epsilon)

-far from satisfiability. Next, for the general k-CSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances

(1-2k/2^k-\epsilon)

-far from satisfiability requires

\Omega(n)

queries. The matching NP-hardness is not known, even assuming the Unique Games Conjecture or the

d

-to-

1

Conjecture. As a corollary, for Maximum Independent Set on graphs with

n

vertices and a degree bound

d

, we show that every approximation algorithm within a factor d/\poly\log d and an additive error of

\epsilon n

requires

\Omega(n)

queries. Previously, only super-constant lower bounds were known

arXiv.org e-Print Archive

CiteSeerX

Preventing False Discovery in Interactive Data Analysis is Hard

Author: Hardt Moritz
Ullman Jonathan
Publication venue
Publication date: 06/08/2014
Field of study

We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given

n

samples from an unknown distribution can give valid answers to

n^{3+o(1)}

adaptively chosen statistical queries. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution. Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively (no query may depend on the answers to previous queries). Moreover, a recent work by Dwork et al. shows how to accurately answer exponentially many adaptively chosen statistical queries via a computationally inefficient algorithm; and how to answer a quadratic number of adaptive queries via a computationally efficient algorithm. The latter result implies that our result is tight up to a linear factor in

n.

Conceptually, our result demonstrates that achieving statistical validity alone can be a source of computational intractability in adaptive settings. For example, in the modern large collaborative research environment, data analysts typically choose a particular approach based on previous findings. False discovery occurs if a research finding is supported by the data but not by the underlying distribution. While the study of preventing false discovery in Statistics is decades old, to the best of our knowledge our result is the first to demonstrate a computational barrier. In particular, our result suggests that the perceived difficulty of preventing false discovery in today's collaborative research environment may be inherent

arXiv.org e-Print Archive

CiteSeerX

Crossref

Limitations of semidefinite programs for separable states and entangled games

Author: Harrow Aram W.
Natarajan Anand
Wu Xiaodi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2018
Field of study

Semidefinite programs (SDPs) are a framework for exact or approximate optimization that have widespread application in quantum information theory. We introduce a new method for using reductions to construct integrality gaps for SDPs. These are based on new limitations on the sum-of-squares (SoS) hierarchy in approximating two particularly important sets in quantum information theory, where previously no

\omega(1)

-round integrality gaps were known: the set of separable (i.e. unentangled) states, or equivalently, the

2 \rightarrow 4

norm of a matrix, and the set of quantum correlations; i.e. conditional probability distributions achievable with local measurements on a shared entangled state. In both cases no-go theorems were previously known based on computational assumptions such as the Exponential Time Hypothesis (ETH) which asserts that 3-SAT requires exponential time to solve. Our unconditional results achieve the same parameters as all of these previous results (for separable states) or as some of the previous results (for quantum correlations). In some cases we can make use of the framework of Lee-Raghavendra-Steurer (LRS) to establish integrality gaps for any SDP, not only the SoS hierarchy. Our hardness result on separable states also yields a dimension lower bound of approximate disentanglers, answering a question of Watrous and Aaronson et al. These results can be viewed as limitations on the monogamy principle, the PPT test, the ability of Tsirelson-type bounds to restrict quantum correlations, as well as the SDP hierarchies of Doherty-Parrilo-Spedalieri, Navascues-Pironio-Acin and Berta-Fawzi-Scholz.Comment: 47 pages. v2. small changes, fixes and clarifications. published versio

arXiv.org e-Print Archive

DSpace@MIT

Caltech Authors