3,065 research outputs found

    Testing probability distributions underlying aggregated data

    Full text link
    In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution DD over [n][n]. More precisely, we define both the dual and cumulative dual access models, in which the algorithm AA can both sample from DD and respectively, for any i[n]i\in[n], - query the probability mass D(i)D(i) (query access); or - get the total mass of {1,,i}\{1,\dots,i\}, i.e. j=1iD(j)\sum_{j=1}^i D(j) (cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

    Testing List H-Homomorphisms

    Full text link
    Let HH be an undirected graph. In the List HH-Homomorphism Problem, given an undirected graph GG with a list constraint L(v)V(H)L(v) \subseteq V(H) for each variable vV(G)v \in V(G), the objective is to find a list HH-homomorphism f:V(G)V(H)f:V(G) \to V(H), that is, f(v)L(v)f(v) \in L(v) for every vV(G)v \in V(G) and (f(u),f(v))E(H)(f(u),f(v)) \in E(H) whenever (u,v)E(G)(u,v) \in E(G). We consider the following problem: given a map f:V(G)V(H)f:V(G) \to V(H) as an oracle access, the objective is to decide with high probability whether ff is a list HH-homomorphism or \textit{far} from any list HH-homomorphisms. The efficiency of an algorithm is measured by the number of accesses to ff. In this paper, we classify graphs HH with respect to the query complexity for testing list HH-homomorphisms and show the following trichotomy holds: (i) List HH-homomorphisms are testable with a constant number of queries if and only if HH is a reflexive complete graph or an irreflexive complete bipartite graph. (ii) List HH-homomorphisms are testable with a sublinear number of queries if and only if HH is a bi-arc graph. (iii) Testing list HH-homomorphisms requires a linear number of queries if HH is not a bi-arc graph

    Differentially Private Release and Learning of Threshold Functions

    Full text link
    We prove new upper and lower bounds on the sample complexity of (ϵ,δ)(\epsilon, \delta) differentially private algorithms for releasing approximate answers to threshold functions. A threshold function cxc_x over a totally ordered domain XX evaluates to cx(y)=1c_x(y) = 1 if yxy \le x, and evaluates to 00 otherwise. We give the first nontrivial lower bound for releasing thresholds with (ϵ,δ)(\epsilon,\delta) differential privacy, showing that the task is impossible over an infinite domain XX, and moreover requires sample complexity nΩ(logX)n \ge \Omega(\log^*|X|), which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with n2(1+o(1))logXn \le 2^{(1+ o(1))\log^*|X|} samples. This improves the previous best upper bound of 8(1+o(1))logX8^{(1 + o(1))\log^*|X|} (Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with (ϵ,δ)(\epsilon,\delta) differential privacy and learning without privacy. For properly learning thresholds in \ell dimensions, this lower bound extends to nΩ(logX)n \ge \Omega(\ell \cdot \log^*|X|). To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database DD of elements from XX, the interior point problem asks for an element between the smallest and largest elements in DD. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.Comment: 43 page

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before

    Lower Bounds on Query Complexity for Testing Bounded-Degree CSPs

    Full text link
    In this paper, we consider lower bounds on the query complexity for testing CSPs in the bounded-degree model. First, for any ``symmetric'' predicate P:0,1k0,1P:{0,1}^{k} \to {0,1} except \equ where k3k\geq 3, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances (P1(0)/2kϵ)(|P^{-1}(0)|/2^k-\epsilon)-far from satisfiability requires Ω(n1/2+δ)\Omega(n^{1/2+\delta}) queries where nn is the number of variables and δ>0\delta>0 is a constant that depends on PP and ϵ\epsilon. This breaks a natural lower bound Ω(n1/2)\Omega(n^{1/2}), which is obtained by the birthday paradox. We also show that every one-sided error tester requires Ω(n)\Omega(n) queries for such PP. These results are hereditary in the sense that the same results hold for any predicate QQ such that P1(1)Q1(1)P^{-1}(1) \subseteq Q^{-1}(1). For EQU, we give a one-sided error tester whose query complexity is O~(n1/2)\tilde{O}(n^{1/2}). Also, for 2-XOR (or, equivalently E2LIN2), we show an Ω(n1/2+δ)\Omega(n^{1/2+\delta}) lower bound for distinguishing instances between ϵ\epsilon-close to and (1/2ϵ)(1/2-\epsilon)-far from satisfiability. Next, for the general k-CSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances (12k/2kϵ)(1-2k/2^k-\epsilon)-far from satisfiability requires Ω(n)\Omega(n) queries. The matching NP-hardness is not known, even assuming the Unique Games Conjecture or the dd-to-11 Conjecture. As a corollary, for Maximum Independent Set on graphs with nn vertices and a degree bound dd, we show that every approximation algorithm within a factor d/\poly\log d and an additive error of ϵn\epsilon n requires Ω(n)\Omega(n) queries. Previously, only super-constant lower bounds were known

    Preventing False Discovery in Interactive Data Analysis is Hard

    Full text link
    We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given nn samples from an unknown distribution can give valid answers to n3+o(1)n^{3+o(1)} adaptively chosen statistical queries. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution. Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively (no query may depend on the answers to previous queries). Moreover, a recent work by Dwork et al. shows how to accurately answer exponentially many adaptively chosen statistical queries via a computationally inefficient algorithm; and how to answer a quadratic number of adaptive queries via a computationally efficient algorithm. The latter result implies that our result is tight up to a linear factor in n.n. Conceptually, our result demonstrates that achieving statistical validity alone can be a source of computational intractability in adaptive settings. For example, in the modern large collaborative research environment, data analysts typically choose a particular approach based on previous findings. False discovery occurs if a research finding is supported by the data but not by the underlying distribution. While the study of preventing false discovery in Statistics is decades old, to the best of our knowledge our result is the first to demonstrate a computational barrier. In particular, our result suggests that the perceived difficulty of preventing false discovery in today's collaborative research environment may be inherent

    Limitations of semidefinite programs for separable states and entangled games

    Get PDF
    Semidefinite programs (SDPs) are a framework for exact or approximate optimization that have widespread application in quantum information theory. We introduce a new method for using reductions to construct integrality gaps for SDPs. These are based on new limitations on the sum-of-squares (SoS) hierarchy in approximating two particularly important sets in quantum information theory, where previously no ω(1)\omega(1)-round integrality gaps were known: the set of separable (i.e. unentangled) states, or equivalently, the 242 \rightarrow 4 norm of a matrix, and the set of quantum correlations; i.e. conditional probability distributions achievable with local measurements on a shared entangled state. In both cases no-go theorems were previously known based on computational assumptions such as the Exponential Time Hypothesis (ETH) which asserts that 3-SAT requires exponential time to solve. Our unconditional results achieve the same parameters as all of these previous results (for separable states) or as some of the previous results (for quantum correlations). In some cases we can make use of the framework of Lee-Raghavendra-Steurer (LRS) to establish integrality gaps for any SDP, not only the SoS hierarchy. Our hardness result on separable states also yields a dimension lower bound of approximate disentanglers, answering a question of Watrous and Aaronson et al. These results can be viewed as limitations on the monogamy principle, the PPT test, the ability of Tsirelson-type bounds to restrict quantum correlations, as well as the SDP hierarchies of Doherty-Parrilo-Spedalieri, Navascues-Pironio-Acin and Berta-Fawzi-Scholz.Comment: 47 pages. v2. small changes, fixes and clarifications. published versio
    corecore