7,477 research outputs found
Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy
Differential privacy is a promising privacy-preserving paradigm for
statistical query processing over sensitive data. It works by injecting random
noise into each query result, such that it is provably hard for the adversary
to infer the presence or absence of any individual record from the published
noisy results. The main objective in differentially private query processing is
to maximize the accuracy of the query results, while satisfying the privacy
guarantees. Previous work, notably \cite{LHR+10}, has suggested that with an
appropriate strategy, processing a batch of correlated queries as a whole
achieves considerably higher accuracy than answering them individually.
However, to our knowledge there is currently no practical solution to find such
a strategy for an arbitrary query batch; existing methods either return
strategies of poor quality (often worse than naive methods) or require
prohibitively expensive computations for even moderately large domains.
Motivated by this, we propose low-rank mechanism (LRM), the first practical
differentially private technique for answering batch linear queries with high
accuracy. LRM works for both exact (i.e., -) and approximate (i.e.,
(, )-) differential privacy definitions. We derive the
utility guarantees of LRM, and provide guidance on how to set the privacy
parameters given the user's utility expectation. Extensive experiments using
real data demonstrate that our proposed method consistently outperforms
state-of-the-art query processing solutions under differential privacy, by
large margins.Comment: ACM Transactions on Database Systems (ACM TODS). arXiv admin note:
text overlap with arXiv:1212.230
Structurally Tractable Uncertain Data
Many data management applications must deal with data which is uncertain,
incomplete, or noisy. However, on existing uncertain data representations, we
cannot tractably perform the important query evaluation tasks of determining
query possibility, certainty, or probability: these problems are hard on
arbitrary uncertain input instances. We thus ask whether we could restrict the
structure of uncertain data so as to guarantee the tractability of exact query
evaluation. We present our tractability results for tree and tree-like
uncertain data, and a vision for probabilistic rule reasoning. We also study
uncertainty about order, proposing a suitable representation, and study
uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium
201
An Adaptive Mechanism for Accurate Query Answering under Differential Privacy
We propose a novel mechanism for answering sets of count- ing queries under
differential privacy. Given a workload of counting queries, the mechanism
automatically selects a different set of "strategy" queries to answer
privately, using those answers to derive answers to the workload. The main
algorithm proposed in this paper approximates the optimal strategy for any
workload of linear counting queries. With no cost to the privacy guarantee, the
mechanism improves significantly on prior approaches and achieves near-optimal
error for many workloads, when applied under (\epsilon, \delta)-differential
privacy. The result is an adaptive mechanism which can help users achieve good
utility without requiring that they reason carefully about the best formulation
of their task.Comment: VLDB2012. arXiv admin note: substantial text overlap with
arXiv:1103.136
Database Learning: Toward a Database that Becomes Smarter Every Time
In today's databases, previous query answers rarely benefit answering future
queries. For the first time, to the best of our knowledge, we change this
paradigm in an approximate query processing (AQP) context. We make the
following observation: the answer to each query reveals some degree of
knowledge about the answer to another query because their answers stem from the
same underlying distribution that has produced the entire dataset. Exploiting
and refining this knowledge should allow us to answer queries more
analytically, rather than by reading enormous amounts of raw data. Also,
processing more queries should continuously enhance our knowledge of the
underlying distribution, and hence lead to increasingly faster response times
for future queries.
We call this novel idea---learning from past query answers---Database
Learning. We exploit the principle of maximum entropy to produce answers, which
are in expectation guaranteed to be more accurate than existing sample-based
approximations. Empowered by this idea, we build a query engine on top of Spark
SQL, called Verdict. We conduct extensive experiments on real-world query
traces from a large customer of a major database vendor. Our results
demonstrate that Verdict supports 73.7% of these queries, speeding them up by
up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM
SIGMOD conference 201
Efficient Batch Query Answering Under Differential Privacy
Differential privacy is a rigorous privacy condition achieved by randomizing
query answers. This paper develops efficient algorithms for answering multiple
queries under differential privacy with low error. We pursue this goal by
advancing a recent approach called the matrix mechanism, which generalizes
standard differentially private mechanisms. This new mechanism works by first
answering a different set of queries (a strategy) and then inferring the
answers to the desired workload of queries. Although a few strategies are known
to work well on specific workloads, finding the strategy which minimizes error
on an arbitrary workload is intractable. We prove a new lower bound on the
optimal error of this mechanism, and we propose an efficient algorithm that
approaches this bound for a wide range of workloads.Comment: 6 figues, 22 page
Using Fuzzy Linguistic Representations to Provide Explanatory Semantics for Data Warehouses
A data warehouse integrates large amounts of extracted and summarized data from multiple sources for direct querying and analysis. While it provides decision makers with easy access to such historical and aggregate data, the real meaning of the data has been ignored. For example, "whether a total sales amount 1,000 items indicates a good or bad sales performance" is still unclear. From the decision makers' point of view, the semantics rather than raw numbers which convey the meaning of the data is very important. In this paper, we explore the use of fuzzy technology to provide this semantics for the summarizations and aggregates developed in data warehousing systems. A three layered data warehouse semantic model, consisting of quantitative (numerical) summarization, qualitative (categorical) summarization, and quantifier summarization, is proposed for capturing and explicating the semantics of warehoused data. Based on the model, several algebraic operators are defined. We also extend the SQL language to allow for flexible queries against such enhanced data warehouses
Convex Optimization for Linear Query Processing under Approximate Differential Privacy
Differential privacy enables organizations to collect accurate aggregates
over sensitive data with strong, rigorous guarantees on individuals' privacy.
Previous work has found that under differential privacy, computing multiple
correlated aggregates as a batch, using an appropriate \emph{strategy}, may
yield higher accuracy than computing each of them independently. However,
finding the best strategy that maximizes result accuracy is non-trivial, as it
involves solving a complex constrained optimization program that appears to be
non-linear and non-convex. Hence, in the past much effort has been devoted in
solving this non-convex optimization program. Existing approaches include
various sophisticated heuristics and expensive numerical solutions. None of
them, however, guarantees to find the optimal solution of this optimization
problem.
This paper points out that under (, )-differential privacy,
the optimal solution of the above constrained optimization problem in search of
a suitable strategy can be found, rather surprisingly, by solving a simple and
elegant convex optimization program. Then, we propose an efficient algorithm
based on Newton's method, which we prove to always converge to the optimal
solution with linear global convergence rate and quadratic local convergence
rate. Empirical evaluations demonstrate the accuracy and efficiency of the
proposed solution.Comment: to appear in ACM SIGKDD 201
- …