Search CORE

14 research outputs found

Optimal error of query sets under the differentially-private matrix mechanism

Author: Li Chao
Miklau Gerome
Publication venue
Publication date: 01/01/2012
Field of study

A common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a specified set of queries accurately, sacrificing fidelity for other queries. This work considers methods for producing synthetic data under differential privacy and investigates what makes a set of queries "easy" or "hard" to answer. We consider answering sets of linear counting queries using the matrix mechanism, a recent differentially-private mechanism that can reduce error by adding complex correlated noise adapted to a specified workload. Our main result is a novel lower bound on the minimum total error required to simultaneously release answers to a set of workload queries. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. The bound is most informative for

(\epsilon,\delta)

-differential privacy but also applies to

\epsilon

-differential privacy.Comment: 35 pages; Short version to appear in the 16th International Conference on Database Theory (ICDT), 201

arXiv.org e-Print Archive

CiteSeerX

ScholarWorks@UMass Amherst

HAL Descartes

Hal-Diderot

Impossibility of dimension reduction in the nuclear norm

Author: Naor Assaf
Pisier Gilles
Schechtman Gideon
Publication venue
Publication date: 24/10/2017
Field of study

Let

\mathsf{S}_1

(the Schatten--von Neumann trace class) denote the Banach space of all compact linear operators

T:\ell_2\to \ell_2

whose nuclear norm

\|T\|_{\mathsf{S}_1}=\sum_{j=1}^\infty\sigma_j(T)

is finite, where

\{\sigma_j(T)\}_{j=1}^\infty

are the singular values of

T

. We prove that for arbitrarily large

n\in \mathbb{N}

there exists a subset

\mathcal{C}\subseteq \mathsf{S}_1

with

|\mathcal{C}|=n

that cannot be embedded with bi-Lipschitz distortion

O(1)

into any

n^{o(1)}

-dimensional linear subspace of

\mathsf{S}_1

\mathcal{C}

is not even a

O(1)

-Lipschitz quotient of any subset of any

n^{o(1)}

-dimensional linear subspace of

\mathsf{S}_1

. Thus,

\mathsf{S}_1

does not admit a dimension reduction result \'a la Johnson and Lindenstrauss (1984), which complements the work of Harrow, Montanaro and Short (2011) on the limitations of quantum dimension reduction under the assumption that the embedding into low dimensions is a quantum channel. Such a statement was previously known with

\mathsf{S}_1

replaced by the Banach space

\ell_1

of absolutely summable sequences via the work of Brinkman and Charikar (2003). In fact, the above set

\mathcal{C}

can be taken to be the same set as the one that Brinkman and Charikar considered, viewed as a collection of diagonal matrices in

\mathsf{S}_1

. The challenge is to demonstrate that

\mathcal{C}

cannot be faithfully realized in an arbitrary low-dimensional subspace of

\mathsf{S}_1

, while Brinkman and Charikar obtained such an assertion only for subspaces of

\mathsf{S}_1

that consist of diagonal operators (i.e., subspaces of

\ell_1

). We establish this by proving that the Markov 2-convexity constant of any finite dimensional linear subspace

X

\mathsf{S}_1

is at most a universal constant multiple of

\sqrt{\log \mathrm{dim}(X)}

arXiv.org e-Print Archive

Crossref

Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies

Author: Ding Bolin
He Xi
Machanavajjhala Ashwin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/06/2014
Field of study

Privacy definitions provide ways for trading-off the privacy of individuals in a statistical database for the utility of downstream analysis of the data. In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off. In particular, we allow data publishers to extend differential privacy using a policy, which specifies (a) secrets, or information that must be kept secret, and (b) constraints that may be known about the data. While the secret specification allows increased utility by lessening protection for certain individual properties, the constraint specification provides added protection against an adversary who knows correlations in the data (arising from constraints). We formalize policies and present novel algorithms that can handle general specifications of sensitive information and certain count constraints. We show that there are reasonable policies under which our privacy mechanisms for k-means clustering, histograms and range queries introduce significantly lesser noise than their differentially private counterparts. We quantify the privacy-utility trade-offs for various policies analytically and empirically on real datasets.Comment: Full version of the paper at SIGMOD'14 Snowbird, Utah US

arXiv.org e-Print Archive

Crossref

Linear and Range Counting under Metric-based Local Differential Privacy

Author: Ding Bolin
He Xi
Xiang Zhuolun
Zhou Jingren
Publication venue
Publication date: 16/05/2020
Field of study

Local differential privacy (LDP) enables private data sharing and analytics without the need for a trusted data collector. Error-optimal primitives (for, e.g., estimating means and item frequencies) under LDP have been well studied. For analytical tasks such as range queries, however, the best known error bound is dependent on the domain size of private data, which is potentially prohibitive. This deficiency is inherent as LDP protects the same level of indistinguishability between any pair of private data values for each data downer. In this paper, we utilize an extension of

\epsilon

-LDP called Metric-LDP or

E

-LDP, where a metric

E

defines heterogeneous privacy guarantees for different pairs of private data values and thus provides a more flexible knob than

\epsilon

does to relax LDP and tune utility-privacy trade-offs. We show that, under such privacy relaxations, for analytical workloads such as linear counting, multi-dimensional range counting queries, and quantile queries, we can achieve significant gains in utility. In particular, for range queries under

E

-LDP where the metric

E

is the

L^1

-distance function scaled by

\epsilon

, we design mechanisms with errors independent on the domain sizes; instead, their errors depend on the metric

E

, which specifies in what granularity the private data is protected. We believe that the primitives we design for

E

-LDP will be useful in developing mechanisms for other analytical tasks, and encourage the adoption of LDP in practice

arXiv.org e-Print Archive

Crossref

Convex Optimization for Linear Query Processing under Approximate Differential Privacy

Author: Hao Zhifeng
Yang Yin
Yuan Ganzhao
Zhang Zhenjie
Publication venue
Publication date: 16/05/2016
Field of study

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals' privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate \emph{strategy}, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained optimization program that appears to be non-linear and non-convex. Hence, in the past much effort has been devoted in solving this non-convex optimization program. Existing approaches include various sophisticated heuristics and expensive numerical solutions. None of them, however, guarantees to find the optimal solution of this optimization problem. This paper points out that under (

\epsilon

\delta

)-differential privacy, the optimal solution of the above constrained optimization problem in search of a suitable strategy can be found, rather surprisingly, by solving a simple and elegant convex optimization program. Then, we propose an efficient algorithm based on Newton's method, which we prove to always converge to the optimal solution with linear global convergence rate and quadratic local convergence rate. Empirical evaluations demonstrate the accuracy and efficiency of the proposed solution.Comment: to appear in ACM SIGKDD 201

arXiv.org e-Print Archive

Crossref

Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy

Author: Hao Zhifeng
Winslett Marianne
Xiao Xiaokui
Yang Yin
Yuan Ganzhao
Zhang Zhenjie
Publication venue
Publication date: 01/01/2015
Field of study

Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works by injecting random noise into each query result, such that it is provably hard for the adversary to infer the presence or absence of any individual record from the published noisy results. The main objective in differentially private query processing is to maximize the accuracy of the query results, while satisfying the privacy guarantees. Previous work, notably \cite{LHR+10}, has suggested that with an appropriate strategy, processing a batch of correlated queries as a whole achieves considerably higher accuracy than answering them individually. However, to our knowledge there is currently no practical solution to find such a strategy for an arbitrary query batch; existing methods either return strategies of poor quality (often worse than naive methods) or require prohibitively expensive computations for even moderately large domains. Motivated by this, we propose low-rank mechanism (LRM), the first practical differentially private technique for answering batch linear queries with high accuracy. LRM works for both exact (i.e.,

\epsilon

-) and approximate (i.e., (

\epsilon

\delta

)-) differential privacy definitions. We derive the utility guarantees of LRM, and provide guidance on how to set the privacy parameters given the user's utility expectation. Extensive experiments using real data demonstrate that our proposed method consistently outperforms state-of-the-art query processing solutions under differential privacy, by large margins.Comment: ACM Transactions on Database Systems (ACM TODS). arXiv admin note: text overlap with arXiv:1212.230

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)