27,200 research outputs found
An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy
Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization
In privacy-preserving data publishing, approaches using Value Generalization
Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs
play a key role in the utility of published datasets as they dictate how the
anonymization of the data occurs. For categorical attributes, it is imperative
to preserve the semantics of the original data in order to achieve a higher
utility. Despite this, semantics have not being formally considered in the
specification of VGHs. Moreover, there are no methods that allow the users to
assess the quality of their VGH. In this paper, we propose a measurement
scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in
terms of semantic consistency and taxonomic organization, with the aim of
producing higher-quality anonymizations. We demonstrate, through a case study,
how our evaluation scheme can be used to compare the quality of multiple VGHs
and can help to identify faulty VGHs.Comment: 18 pages, 7 figures, presented in the Privacy in Statistical
Databases Conference 2014 (Ibiza, Spain
Emerging privacy challenges and approaches in CAV systems
The growth of Internet-connected devices, Internet-enabled services and Internet of Things systems continues at a rapid pace, and their application to transport systems is heralded as game-changing. Numerous developing CAV (Connected and Autonomous Vehicle) functions, such as traffic planning, optimisation, management, safety-critical and cooperative autonomous driving applications, rely on data from various sources. The efficacy of these functions is highly dependent on the dimensionality, amount and accuracy of the data being shared. It holds, in general, that the greater the amount of data available, the greater the efficacy of the function. However, much of this data is privacy-sensitive, including personal, commercial and research data. Location data and its correlation with identity and temporal data can help infer other personal information, such as home/work locations, age, job, behavioural features, habits, social relationships. This work categorises the emerging privacy challenges and solutions for CAV systems and identifies the knowledge gap for future research, which will minimise and mitigate privacy concerns without hampering the efficacy of the functions
Generating Artificial Data for Private Deep Learning
In this paper, we propose generating artificial data that retain statistical
properties of real data as the means of providing privacy with respect to the
original dataset. We use generative adversarial network to draw
privacy-preserving artificial data samples and derive an empirical method to
assess the risk of information disclosure in a differential-privacy-like way.
Our experiments show that we are able to generate artificial data of high
quality and successfully train and validate machine learning models on this
data while limiting potential privacy loss.Comment: Privacy-Enhancing Artificial Intelligence and Language Technologies,
AAAI Spring Symposium Series, 201
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
- âŠ