4,334 research outputs found

    Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

    Full text link
    In order to generate synthetic basket data sets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket data sets. The characteristics that could be used for this purpose include the frequent itemsets and association rules. The problem of generating synthetic basket data sets from frequent itemsets is generally referred to as inverse frequent itemset mining. In this paper, we show that the problem of approximate inverse frequent itemset mining is {\bf NP}-complete. Then we propose and analyze an approximate algorithm for approximate inverse frequent itemset mining, and discuss privacy issues related to the synthetic basket data set. In particular, we propose an approximate algorithm to determine the privacy leakage in a synthetic basket data set

    On the Privacy of Optimization Approaches

    Full text link
    Ensuring privacy of sensitive data is essential in many contexts, such as healthcare data, banks, e-commerce, wireless sensor networks, and social networks. It is common that different entities coordinate or want to rely on a third party to solve a specific problem. At the same time, no entity wants to publish its problem data during the solution procedure unless there is a privacy guarantee. Unlike cryptography and differential privacy based approaches, the methods based on optimization lack a quantification of the privacy they can provide. The main contribution of this paper is to provide a mechanism to quantify the privacy of a broad class of optimization approaches. In particular, we formally define a one-to-many relation, which relates a given adversarial observed message to an uncertainty set of the problem data. This relation quantifies the potential ambiguity on problem data due to the employed optimization approaches. The privacy definitions are then formalized based on the uncertainty sets. The properties of the proposed privacy measure is analyzed. The key ideas are illustrated with examples, including localization, average consensus, among others

    Privacy-Preserving Filtering for Event Streams

    Full text link
    Many large-scale information systems such as intelligent transportation systems, smart grids or smart buildings collect data about the activities of their users to optimize their operations. To encourage participation and adoption of these systems, it is becoming increasingly important that the design process take privacy issues into consideration. In a typical scenario, signals originate from many sensors capturing events involving the users, and several statistics of interest need to be continuously published in real-time. This paper considers the problem of providing differential privacy guarantees for such multi-input multi-output systems processing event streams. We show how to construct and optimize various extensions of the zero-forcing equalization mechanism, which we previously proposed for single-input single-output systems. Some of these extensions can take a model of the input signals into account. We illustrate our privacy-preserving filter design methodology through the problem of privately monitoring and forecasting occupancy in a building equipped with multiple motion detection sensors.Comment: This version subsumes both the previous version and arXiv:1304.231

    Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible

    Full text link
    Non-interactive Local Differential Privacy (LDP) requires data analysts to collect data from users through noisy channel at once. In this paper, we extend the frontiers of Non-interactive LDP learning and estimation from several aspects. For learning with smooth generalized linear losses, we propose an approximate stochastic gradient oracle estimated from non-interactive LDP channel, using Chebyshev expansion. Combined with inexact gradient methods, we obtain an efficient algorithm with quasi-polynomial sample complexity bound. For the high-dimensional world, we discover that under â„“2\ell_2-norm assumption on data points, high-dimensional sparse linear regression and mean estimation can be achieved with logarithmic dependence on dimension, using random projection and approximate recovery. We also extend our methods to Kernel Ridge Regression. Our work is the first one that makes learning and estimation possible for a broad range of learning tasks under non-interactive LDP model

    A Two-Stage Architecture for Differentially Private Kalman Filtering and LQG Control

    Full text link
    Large-scale monitoring and control systems enabling a more intelligent infrastructure increasingly rely on sensitive data obtained from private agents, e.g., location traces collected from the users of an intelligent transportation system. In order to encourage the participation of these agents, it becomes then critical to design algorithms that process information in a privacy-preserving way. This article revisits the Kalman filtering and Linear Quadratic Gaussian (LQG) control problems, subject to privacy constraints. We aim to enforce differential privacy, a formal, state-of-the-art definition of privacy ensuring that the output of an algorithm is not too sensitive to the data collected from any single participating agent. A two-stage architecture is proposed that first aggregates and combines the individual agent signals before adding privacy-preserving noise and post-filtering the result to be published. We show a significant performance improvement offered by this architecture over input perturbation schemes as the number of input signals increases and that an optimal static aggregation stage can be computed by solving a semidefinite program. The two-stage architecture, which we develop first for Kalman filtering, is then adapted to the LQG control problem by leveraging the separation principle. Numerical simulations illustrate the performance improvements over differentially private algorithms without first-stage signal aggregation.Comment: Long version of a paper presented at GlobalSIP 2017. Submitted for journal publicatio

    Privacy-Preserving Nonlinear Observer Design Using Contraction Analysis

    Full text link
    Real-time information processing applications such as those enabling a more intelligent infrastructure are increasingly focused on analyzing privacy-sensitive data obtained from individuals. To produce accurate statistics about the habits of a population of users of a system, this data might need to be processed through model-based estimators. Moreover, models of population dynamics, originating for example from epidemiology or the social sciences, are often necessarily nonlinear. Motivated by these trends, this paper presents an approach to design nonlinear privacy-preserving model-based observers, relying on additive input or output noise to give differential privacy guarantees to the individuals providing the input data. For the case of output perturbation, contraction analysis allows us to design convergent observers as well as set the level of privacy-preserving noise appropriately. Two examples illustrate the approach: estimating the edge formation probabilities in a dynamic social network, and syndromic surveillance relying on an epidemiological model.Comment: 23 pages, 3 figure

    When an attacker meets a cipher-image in 2018: A Year in Review

    Full text link
    This paper aims to review the encountered technical contradictions when an attacker meets the cipher-images encrypted by the image encryption schemes (algorithms) proposed in 2018 from the viewpoint of an image cryptanalyst. The most representative works among them are selected and classified according to their essential structures. Almost all image cryptanalysis works published in 2018 are surveyed due to their small number. The challenging problems on design and analysis of image encryption schemes are summarized to receive the attentions of both designers and attackers (cryptanalysts) of image encryption schemes, which may promote solving scenario-oriented image security problems with new technologies.Comment: 12 page

    Real-time semiparametric regression for distributed data sets

    Full text link
    This paper proposes a method for semiparametric regression analysis of large-scale data which are distributed over multiple hosts. This enables modeling of nonlinear relationships and both the batch approach, where analysis starts after all data have been collected, and the real-time setting are addressed. The methodology is extended to operate in evolving environments, where it can no longer be assumed that model parameters remain constant over time. Two areas of application for the methodology are presented: regression modeling when there are multiple data owners and regression modeling within the MapReduce framework. A website, realtime-semiparametric-regression.net, illustrates the use of the proposed method on United States domestic airline data in real-time

    Private Posterior distributions from Variational approximations

    Full text link
    Privacy preserving mechanisms such as differential privacy inject additional randomness in the form of noise in the data, beyond the sampling mechanism. Ignoring this additional noise can lead to inaccurate and invalid inferences. In this paper, we incorporate the privacy mechanism explicitly into the likelihood function by treating the original data as missing, with an end goal of estimating posterior distributions over model parameters. This leads to a principled way of performing valid statistical inference using private data, however, the corresponding likelihoods are intractable. In this paper, we derive fast and accurate variational approximations to tackle such intractable likelihoods that arise due to privacy. We focus on estimating posterior distributions of parameters of the naive Bayes log-linear model, where the sufficient statistics of this model are shared using a differentially private interface. Using a simulation study, we show that the posterior approximations outperform the naive method of ignoring the noise addition mechanism

    Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming

    Full text link
    Certifying the safety or robustness of neural networks against input uncertainties and adversarial attacks is an emerging challenge in the area of safe machine learning and control. To provide such a guarantee, one must be able to bound the output of neural networks when their input changes within a bounded set. In this paper, we propose a semidefinite programming (SDP) framework to address this problem for feed-forward neural networks with general activation functions and input uncertainty sets. Our main idea is to abstract various properties of activation functions (e.g., monotonicity, bounded slope, bounded values, and repetition across layers) with the formalism of quadratic constraints. We then analyze the safety properties of the abstracted network via the S-procedure and semidefinite programming. Our framework spans the trade-off between conservatism and computational efficiency and applies to problems beyond safety verification. We evaluate the performance of our approach via numerical problem instances of various sizes
    • …
    corecore