2,673 research outputs found

    Exclusive Strategy for Generalization Algorithms in Micro-data Disclosure

    Full text link
    Abstract. When generalization algorithms are known to the public, an adver-sary can obtain a more precise estimation of the secret table than what can be deduced from the disclosed generalization result. Therefore, whether a general-ization algorithm can satisfy a privacy property should be judged based on such an estimation. In this paper, we show that the computation of the estimation is inherently a recursive process that exhibits a high complexity when generaliza-tion algorithms take a straightforward inclusive strategy. To facilitate the design of more efficient generalization algorithms, we suggest an alternative exclusive strategy, which adopts a seemingly drastic approach to eliminate the need for recursion. Surprisingly, the data utility of the two strategies are actually not com-parable and the exclusive strategy can provide better data utility in certain cases.

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy

    Privacy Protection in Data Mining

    Get PDF

    Preserving Privacy Against Side-Channel Leaks

    Get PDF
    The privacy preserving issues have received significant attentions in various domains. Various models and techniques have been proposed to achieve optimal privacy with minimal costs. However, side-channel leakages (such as, publicly-known algorithms of data publishing, observable traffic information in web application, fine-grained readings in smart metering) further complicate the process of privacy preservation. In this thesis, we make the first effort on investigating a general framework to model side-channel attacks across different domains and applying the framework to various categories of applications. In privacy-preserving data publishing with publicly-known algorithms, we first theoretically study a generic strategy independent of data utility measures and syntactic privacy properties. We then propose an efficient approach to preserving diversity. In privacy-preserving traffic padding in Web applications, we first propose a formal PPTP model to quantify the privacies and costs based on the key observation about the similarity between data publishing and traffic padding. We then introduce randomness into the previous solutions to provide background knowledge-resistant privacy guarantee. In privacy-preserving smart metering, we propose a light-weight approach to simultaneously preserving privacy on both billing and consumption aggregation based on the key observation about the privacy issue beyond the fine-grained readings

    Simulatable Auditing in Micro-Databases

    Get PDF
    How to protect individuals’ privacy while releasing microdata tables for analysis pur- poses has attracted significant attention. We study the case where different microdata tables generalized over the same underlying secret table may be released upon users’ queries. To satisfy privacy constraints, an auditing system must determine whether the next query can be safely answered based on the history of answered queries. However, when answering a new query is not safe, denying it may not be, either, since a denial itself may still convey some sensitive information to the user. We first model this issue in the context of releasing microdata tables. Inspired by the Simulatable Auditing technique in statistical databases, we propose a safe strategy for auditing queries that ask for microdata tables generalized over secret tables. The strategy can provide provably safe answers and good data utility. We also study how to efficiently maintain the history of answered queries for the auditing purpose. To the best of our knowledge, this is the first study on the simulatable auditing issue of microdata queries

    RANDOMIZATION BASED PRIVACY PRESERVING CATEGORICAL DATA ANALYSIS

    Get PDF
    The success of data mining relies on the availability of high quality data. To ensure quality data mining, effective information sharing between organizations becomes a vital requirement in today’s society. Since data mining often involves sensitive infor- mation of individuals, the public has expressed a deep concern about their privacy. Privacy-preserving data mining is a study of eliminating privacy threats while, at the same time, preserving useful information in the released data for data mining. This dissertation investigates data utility and privacy of randomization-based mod- els in privacy preserving data mining for categorical data. For the analysis of data utility in randomization model, we first investigate the accuracy analysis for associ- ation rule mining in market basket data. Then we propose a general framework to conduct theoretical analysis on how the randomization process affects the accuracy of various measures adopted in categorical data analysis. We also examine data utility when randomization mechanisms are not provided to data miners to achieve better privacy. We investigate how various objective associ- ation measures between two variables may be affected by randomization. We then extend it to multiple variables by examining the feasibility of hierarchical loglinear modeling. Our results provide a reference to data miners about what they can do and what they can not do with certainty upon randomized data directly without the knowledge about the original distribution of data and distortion information. Data privacy and data utility are commonly considered as a pair of conflicting re- quirements in privacy preserving data mining applications. In this dissertation, we investigate privacy issues in randomization models. In particular, we focus on the attribute disclosure under linking attack in data publishing. We propose efficient so- lutions to determine optimal distortion parameters such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomiza- tion approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss

    Localizing unauthorized updates in published micro-data tables through secret order-based watermarking

    Get PDF
    The study of micro-data disclosure issue has largely focused on the privacy preservation aspect, whereas the integrity of a published micro-data table has received limited attention. Unauthorized updates to such a table may lead users to believe in misleading data. Traditional cryptographic stamp-based approaches allow users to detect unauthorized updates using credentials issued by the data owner. However, to localize the exact corrupted tuples would require a large number of cryptographic stamps to be stored, leading to prohibitive storage requirements. In this thesis, we explore the fact that tuples in a micro-data table must be stored in a particular order, which has no inherent meaning under the relational model. We propose a series of algorithms for embedding watermarks through reordering the tuples. The embedded watermarks allow users to detect, localize, and restore corrupted tuples with a single secret key issued by the data owner, and no additional storage is required. At the same time, our algorithms also allow for efficient updates by the data owner or legitimate users who know the secret key. The proposed algorithms are implemented and evaluated through experiments with real data

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S.\ statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy.Comment: Forthcoming in American Economic Revie

    Private Graph Data Release: A Survey

    Full text link
    The application of graph analytics to various domains have yielded tremendous societal and economical benefits in recent years. However, the increasingly widespread adoption of graph analytics comes with a commensurate increase in the need to protect private information in graph databases, especially in light of the many privacy breaches in real-world graph data that was supposed to preserve sensitive information. This paper provides a comprehensive survey of private graph data release algorithms that seek to achieve the fine balance between privacy and utility, with a specific focus on provably private mechanisms. Many of these mechanisms fall under natural extensions of the Differential Privacy framework to graph data, but we also investigate more general privacy formulations like Pufferfish Privacy that can deal with the limitations of Differential Privacy. A wide-ranging survey of the applications of private graph data release mechanisms to social networks, finance, supply chain, health and energy is also provided. This survey paper and the taxonomy it provides should benefit practitioners and researchers alike in the increasingly important area of private graph data release and analysis

    Economic Analysis and Statistical Disclosure Limitation

    Get PDF
    This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect confidentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies
    corecore