4,334 research outputs found
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
In order to generate synthetic basket data sets for better benchmark testing,
it is important to integrate characteristics from real-life databases into the
synthetic basket data sets. The characteristics that could be used for this
purpose include the frequent itemsets and association rules. The problem of
generating synthetic basket data sets from frequent itemsets is generally
referred to as inverse frequent itemset mining. In this paper, we show that the
problem of approximate inverse frequent itemset mining is {\bf NP}-complete.
Then we propose and analyze an approximate algorithm for approximate inverse
frequent itemset mining, and discuss privacy issues related to the synthetic
basket data set. In particular, we propose an approximate algorithm to
determine the privacy leakage in a synthetic basket data set
On the Privacy of Optimization Approaches
Ensuring privacy of sensitive data is essential in many contexts, such as
healthcare data, banks, e-commerce, wireless sensor networks, and social
networks. It is common that different entities coordinate or want to rely on a
third party to solve a specific problem. At the same time, no entity wants to
publish its problem data during the solution procedure unless there is a
privacy guarantee. Unlike cryptography and differential privacy based
approaches, the methods based on optimization lack a quantification of the
privacy they can provide. The main contribution of this paper is to provide a
mechanism to quantify the privacy of a broad class of optimization approaches.
In particular, we formally define a one-to-many relation, which relates a given
adversarial observed message to an uncertainty set of the problem data. This
relation quantifies the potential ambiguity on problem data due to the employed
optimization approaches. The privacy definitions are then formalized based on
the uncertainty sets. The properties of the proposed privacy measure is
analyzed. The key ideas are illustrated with examples, including localization,
average consensus, among others
Privacy-Preserving Filtering for Event Streams
Many large-scale information systems such as intelligent transportation
systems, smart grids or smart buildings collect data about the activities of
their users to optimize their operations. To encourage participation and
adoption of these systems, it is becoming increasingly important that the
design process take privacy issues into consideration. In a typical scenario,
signals originate from many sensors capturing events involving the users, and
several statistics of interest need to be continuously published in real-time.
This paper considers the problem of providing differential privacy guarantees
for such multi-input multi-output systems processing event streams. We show how
to construct and optimize various extensions of the zero-forcing equalization
mechanism, which we previously proposed for single-input single-output systems.
Some of these extensions can take a model of the input signals into account. We
illustrate our privacy-preserving filter design methodology through the problem
of privately monitoring and forecasting occupancy in a building equipped with
multiple motion detection sensors.Comment: This version subsumes both the previous version and arXiv:1304.231
Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible
Non-interactive Local Differential Privacy (LDP) requires data analysts to
collect data from users through noisy channel at once. In this paper, we extend
the frontiers of Non-interactive LDP learning and estimation from several
aspects. For learning with smooth generalized linear losses, we propose an
approximate stochastic gradient oracle estimated from non-interactive LDP
channel, using Chebyshev expansion. Combined with inexact gradient methods, we
obtain an efficient algorithm with quasi-polynomial sample complexity bound.
For the high-dimensional world, we discover that under -norm assumption
on data points, high-dimensional sparse linear regression and mean estimation
can be achieved with logarithmic dependence on dimension, using random
projection and approximate recovery. We also extend our methods to Kernel Ridge
Regression. Our work is the first one that makes learning and estimation
possible for a broad range of learning tasks under non-interactive LDP model
A Two-Stage Architecture for Differentially Private Kalman Filtering and LQG Control
Large-scale monitoring and control systems enabling a more intelligent
infrastructure increasingly rely on sensitive data obtained from private
agents, e.g., location traces collected from the users of an intelligent
transportation system. In order to encourage the participation of these agents,
it becomes then critical to design algorithms that process information in a
privacy-preserving way. This article revisits the Kalman filtering and Linear
Quadratic Gaussian (LQG) control problems, subject to privacy constraints. We
aim to enforce differential privacy, a formal, state-of-the-art definition of
privacy ensuring that the output of an algorithm is not too sensitive to the
data collected from any single participating agent. A two-stage architecture is
proposed that first aggregates and combines the individual agent signals before
adding privacy-preserving noise and post-filtering the result to be published.
We show a significant performance improvement offered by this architecture over
input perturbation schemes as the number of input signals increases and that an
optimal static aggregation stage can be computed by solving a semidefinite
program. The two-stage architecture, which we develop first for Kalman
filtering, is then adapted to the LQG control problem by leveraging the
separation principle. Numerical simulations illustrate the performance
improvements over differentially private algorithms without first-stage signal
aggregation.Comment: Long version of a paper presented at GlobalSIP 2017. Submitted for
journal publicatio
Privacy-Preserving Nonlinear Observer Design Using Contraction Analysis
Real-time information processing applications such as those enabling a more
intelligent infrastructure are increasingly focused on analyzing
privacy-sensitive data obtained from individuals. To produce accurate
statistics about the habits of a population of users of a system, this data
might need to be processed through model-based estimators. Moreover, models of
population dynamics, originating for example from epidemiology or the social
sciences, are often necessarily nonlinear. Motivated by these trends, this
paper presents an approach to design nonlinear privacy-preserving model-based
observers, relying on additive input or output noise to give differential
privacy guarantees to the individuals providing the input data. For the case of
output perturbation, contraction analysis allows us to design convergent
observers as well as set the level of privacy-preserving noise appropriately.
Two examples illustrate the approach: estimating the edge formation
probabilities in a dynamic social network, and syndromic surveillance relying
on an epidemiological model.Comment: 23 pages, 3 figure
When an attacker meets a cipher-image in 2018: A Year in Review
This paper aims to review the encountered technical contradictions when an
attacker meets the cipher-images encrypted by the image encryption schemes
(algorithms) proposed in 2018 from the viewpoint of an image cryptanalyst. The
most representative works among them are selected and classified according to
their essential structures. Almost all image cryptanalysis works published in
2018 are surveyed due to their small number. The challenging problems on design
and analysis of image encryption schemes are summarized to receive the
attentions of both designers and attackers (cryptanalysts) of image encryption
schemes, which may promote solving scenario-oriented image security problems
with new technologies.Comment: 12 page
Real-time semiparametric regression for distributed data sets
This paper proposes a method for semiparametric regression analysis of
large-scale data which are distributed over multiple hosts. This enables
modeling of nonlinear relationships and both the batch approach, where analysis
starts after all data have been collected, and the real-time setting are
addressed. The methodology is extended to operate in evolving environments,
where it can no longer be assumed that model parameters remain constant over
time. Two areas of application for the methodology are presented: regression
modeling when there are multiple data owners and regression modeling within the
MapReduce framework. A website, realtime-semiparametric-regression.net,
illustrates the use of the proposed method on United States domestic airline
data in real-time
Private Posterior distributions from Variational approximations
Privacy preserving mechanisms such as differential privacy inject additional
randomness in the form of noise in the data, beyond the sampling mechanism.
Ignoring this additional noise can lead to inaccurate and invalid inferences.
In this paper, we incorporate the privacy mechanism explicitly into the
likelihood function by treating the original data as missing, with an end goal
of estimating posterior distributions over model parameters. This leads to a
principled way of performing valid statistical inference using private data,
however, the corresponding likelihoods are intractable. In this paper, we
derive fast and accurate variational approximations to tackle such intractable
likelihoods that arise due to privacy. We focus on estimating posterior
distributions of parameters of the naive Bayes log-linear model, where the
sufficient statistics of this model are shared using a differentially private
interface. Using a simulation study, we show that the posterior approximations
outperform the naive method of ignoring the noise addition mechanism
Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming
Certifying the safety or robustness of neural networks against input
uncertainties and adversarial attacks is an emerging challenge in the area of
safe machine learning and control. To provide such a guarantee, one must be
able to bound the output of neural networks when their input changes within a
bounded set. In this paper, we propose a semidefinite programming (SDP)
framework to address this problem for feed-forward neural networks with general
activation functions and input uncertainty sets. Our main idea is to abstract
various properties of activation functions (e.g., monotonicity, bounded slope,
bounded values, and repetition across layers) with the formalism of quadratic
constraints. We then analyze the safety properties of the abstracted network
via the S-procedure and semidefinite programming. Our framework spans the
trade-off between conservatism and computational efficiency and applies to
problems beyond safety verification. We evaluate the performance of our
approach via numerical problem instances of various sizes
- …