7 research outputs found
Communication Cost Reduction for Subgraph Counting under Local Differential Privacy via Hash Functions
We suggest the use of hash functions to cut down the communication costs when
counting subgraphs under edge local differential privacy. While various
algorithms exist for computing graph statistics, including the count of
subgraphs, under the edge local differential privacy, many suffer with high
communication costs, making them less efficient for large graphs. Though data
compression is a typical approach in differential privacy, its application in
local differential privacy requires a form of compression that every node can
reproduce. In our study, we introduce linear congruence hashing. With a
sampling rate of , our method can cut communication costs by a factor of
, albeit at the cost of increasing variance in the published graph
statistic by a factor of . The experimental results indicate that, when
matched for communication costs, our method achieves a reduction in the
-error for triangle counts by up to 1000 times compared to the
performance of leading algorithms.Comment: 13 pages, 3 figure
Continuous Release of Data Streams under both Centralized and Local Differential Privacy
In this paper, we study the problem of publishing a stream of real-valued
data satisfying differential privacy (DP). One major challenge is that the
maximal possible value can be quite large; thus it is necessary to estimate a
threshold so that numbers above it are truncated to reduce the amount of noise
that is required to all the data. The estimation must be done based on the data
in a private fashion. We develop such a method that uses the Exponential
Mechanism with a quality function that approximates well the utility goal while
maintaining a low sensitivity. Given the threshold, we then propose a novel
online hierarchical method and several post-processing techniques.
Building on these ideas, we formalize the steps into a framework for private
publishing of stream data. Our framework consists of three components: a
threshold optimizer that privately estimates the threshold, a perturber that
adds calibrated noises to the stream, and a smoother that improves the result
using post-processing. Within our framework, we design an algorithm satisfying
the more stringent setting of DP called local DP (LDP). To our knowledge, this
is the first LDP algorithm for publishing streaming data. Using four real-world
datasets, we demonstrate that our mechanism outperforms the state-of-the-art by
a factor of 6-10 orders of magnitude in terms of utility (measured by the mean
squared error of answering a random range query)
Intertwining Order Preserving Encryption and Differential Privacy
Ciphertexts of an order-preserving encryption (OPE) scheme preserve the order
of their corresponding plaintexts. However, OPEs are vulnerable to inference
attacks that exploit this preserved order. At another end, differential privacy
has become the de-facto standard for achieving data privacy. One of the most
attractive properties of DP is that any post-processing (inferential)
computation performed on the noisy output of a DP algorithm does not degrade
its privacy guarantee. In this paper, we intertwine the two approaches and
propose a novel differentially private order preserving encryption scheme,
OP. Under OP, the leakage of order from the ciphertexts is
differentially private. As a result, in the least, OP ensures a
formal guarantee (specifically, a relaxed DP guarantee) even in the face of
inference attacks. To the best of our knowledge, this is the first work to
intertwine DP with a property-preserving encryption scheme. We demonstrate
OP's practical utility in answering range queries via extensive
empirical evaluation on four real-world datasets. For instance, OP
misses only around in every correct records on average for a dataset
of size with an attribute of domain size and
Estimating numerical distributions under local differential privacy
When collecting information, local differential privacy (LDP) relieves the concern of privacy leakage from users' perspective, as user's private information is randomized before sent to the aggregator. We study the problem of recovering the distribution over a numerical domain while satisfying LDP. While one can discretize a numerical domain and then apply the protocols developed for categorical domains, we show that taking advantage of the numerical nature of the domain results in better trade-off of privacy and utility. We introduce a new reporting mechanism, called the square wave SW mechanism, which exploits the numerical nature in reporting. We also develop an Expectation Maximization with Smoothing (EMS) algorithm, which is applied to aggregated histograms from the SW mechanism to estimate the original distributions. Extensive experiments demonstrate that our proposed approach, SW with EMS, consistently outperforms other methods in a variety of utility metrics