27 research outputs found

    On The Hereditary Discrepancy of Homogeneous Arithmetic Progressions

    Full text link
    We show that the hereditary discrepancy of homogeneous arithmetic progressions is lower bounded by n1/O(loglogn)n^{1/O(\log \log n)}. This bound is tight up to the constant in the exponent. Our lower bound goes via proving an exponential lower bound on the discrepancy of set systems of subcubes of the boolean cube {0,1}d\{0, 1\}^d.Comment: To appear in the Proceedings of the American Mathematical Societ

    Differential Privacy and the Fat-Shattering Dimension of Linear Queries

    Full text link
    In this paper, we consider the task of answering linear queries under the constraint of differential privacy. This is a general and well-studied class of queries that captures other commonly studied classes, including predicate queries and histogram queries. We show that the accuracy to which a set of linear queries can be answered is closely related to its fat-shattering dimension, a property that characterizes the learnability of real-valued functions in the agnostic-learning setting.Comment: Appears in APPROX 201

    An Improved Private Mechanism for Small Databases

    Full text link
    We study the problem of answering a workload of linear queries Q\mathcal{Q}, on a database of size at most n=o(Q)n = o(|\mathcal{Q}|) drawn from a universe U\mathcal{U} under the constraint of (approximate) differential privacy. Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for any given Q\mathcal{Q} and nn, answers the queries with average error that is at most a factor polynomial in logQ\log |\mathcal{Q}| and logU\log |\mathcal{U}| worse than the best possible. Here we improve on this guarantee and give a mechanism whose competitiveness ratio is at most polynomial in logn\log n and logU\log |\mathcal{U}|, and has no dependence on Q|\mathcal{Q}|. Our mechanism is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in place of an ad-hoc noise distribution, we use a distribution which is in a sense optimal for the projection mechanism, and analyze it using convex duality and the restricted invertibility principle.Comment: To appear in ICALP 2015, Track

    Tight Lower Bounds for Differentially Private Selection

    Full text link
    A pervasive task in the differential privacy literature is to select the kk items of "highest quality" out of a set of dd items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms. The standard approaches to these tasks---repeated use of the exponential mechanism or the sparse vector technique---approximately solve this problem given a dataset of n=O(klogd)n = O(\sqrt{k}\log d) samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size n=Ω(klogd)n = \Omega(\sqrt{k} \log d) is required even to achieve a very minimal accuracy guarantee. Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of dd queries, but often only some much smaller set of kk queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries

    Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

    Full text link
    Consider a database of nn people, each represented by a bit-string of length dd corresponding to the setting of dd binary attributes. A kk-way marginal query is specified by a subset SS of kk attributes, and a S|S|-dimensional binary vector β\beta specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to SS agrees with β\beta. Privately releasing approximate answers to a set of kk-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least Ω(min{n,dk2})\Omega(\min\{\sqrt{n},d^{\frac{k}{2}}\}) and at most O~(min{nd1/4,dk2})\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\}). However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small nn. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most O~(ndk/24)\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}}). This error bound is as good as the best known information theoretic upper bounds for k=2k=2. This bound is an improvement over previous work on efficiently releasing marginals when kk is small and when error o(n)o(n) is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis

    Privately Releasing Conjunctions and the Statistical Query Barrier

    Full text link
    Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to the first algorithm that efficiently releases differentially private answers to of all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms
    corecore