6,798 research outputs found
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn
Preserving privacy of users is a key requirement of web-scale analytics and
reporting applications, and has witnessed a renewed focus in light of recent
data breaches and new regulations such as GDPR. We focus on the problem of
computing robust, reliable analytics in a privacy-preserving manner, while
satisfying product requirements. We present PriPeARL, a framework for
privacy-preserving analytics and reporting, inspired by differential privacy.
We describe the overall design and architecture, and the key modeling
components, focusing on the unique challenges associated with privacy,
coverage, utility, and consistency. We perform an experimental study in the
context of ads analytics and reporting at LinkedIn, thereby demonstrating the
tradeoffs between privacy and utility needs, and the applicability of
privacy-preserving mechanisms to real-world data. We also highlight the lessons
learned from the production deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information
and Knowledge Management (CIKM 2018
Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices
Smart devices with built-in sensors, computational capabilities, and network
connectivity have become increasingly pervasive. The crowds of smart devices
offer opportunities to collectively sense and perform computing tasks in an
unprecedented scale. This paper presents Crowd-ML, a privacy-preserving machine
learning framework for a crowd of smart devices, which can solve a wide range
of learning problems for crowdsensing data with differential privacy
guarantees. Crowd-ML endows a crowdsensing system with an ability to learn
classifiers or predictors online from crowdsensing data privately with minimal
computational overheads on devices and servers, suitable for a practical and
large-scale employment of the framework. We analyze the performance and the
scalability of Crowd-ML, and implement the system with off-the-shelf
smartphones as a proof of concept. We demonstrate the advantages of Crowd-ML
with real and simulated experiments under various conditions
MATRIX DECOMPOSITION FOR DATA DISCLOSURE CONTROL AND DATA MINING APPLICATIONS
Access to huge amounts of various data with private information brings out a dual demand for preservation of data privacy and correctness of knowledge discovery, which are two apparently contradictory tasks. Low-rank approximations generated by matrix decompositions are a fundamental element in this dissertation for the privacy preserving data mining (PPDM) applications. Two categories of PPDM are studied: data value hiding (DVH) and data pattern hiding (DPH). A matrix-decomposition-based framework is designed to incorporate matrix decomposition techniques into data preprocessing to distort original data sets. With respect to the challenge in the DVH, how to protect sensitive/confidential attribute values without jeopardizing underlying data patterns, we propose singular value decomposition (SVD)-based and nonnegative matrix factorization (NMF)-based models. Some discussion on data distortion and data utility metrics is presented. Our experimental results on benchmark data sets demonstrate that our proposed models have potential for outperforming standard data perturbation models regarding the balance between data privacy and data utility.
Based on an equivalence between the NMF and K-means clustering, a simultaneous data value and pattern hiding strategy is developed for data mining activities using K-means clustering. Three schemes are designed to make a slight alteration on submatrices such that user-specified cluster properties of data subjects are hidden. Performance evaluation demonstrates the efficacy of the proposed strategy since some optimal solutions can be computed with zero side effects on nonconfidential memberships. Accordingly, the protection of privacy is simplified by one modified data set with enhanced performance by this dual privacy protection.
In addition, an improved incremental SVD-updating algorithm is applied to speed up the real-time performance of the SVD-based model for frequent data updates. The performance and effectiveness of the improved algorithm have been examined on synthetic and real data sets. Experimental results indicate that the introduction of the incremental matrix decomposition produces a significant speedup. It also provides potential support for the use of the SVD technique in the On-Line Analytical Processing for business data analysis
State of the Art in Privacy Preserving Data Mining
Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical
access control techniques are not sufficient to guarantee privacy when Data Mining techniques are used. Such a trend, especially in the context of public databases, or in the context of sensible information related to critical infrastructures, represents, nowadays a not negligible thread. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. This is a very complex task and there exist in the scientific literature some different approaches to the problem. In this work we present a "Survey" of the current PPDM methodologies which seem promising for the future.JRC.G.6-Sensors, radar technologies and cybersecurit
Enabling Multi-level Trust in Privacy Preserving Data Mining
Privacy Preserving Data Mining (PPDM) addresses the problem of developing
accurate models about aggregated data without access to precise information in
individual data record. A widely studied \emph{perturbation-based PPDM}
approach introduces random perturbation to individual values to preserve
privacy before data is published. Previous solutions of this approach are
limited in their tacit assumption of single-level trust on data miners.
In this work, we relax this assumption and expand the scope of
perturbation-based PPDM to Multi-Level Trust (MLT-PPDM). In our setting, the
more trusted a data miner is, the less perturbed copy of the data it can
access. Under this setting, a malicious data miner may have access to
differently perturbed copies of the same data through various means, and may
combine these diverse copies to jointly infer additional information about the
original data that the data owner does not intend to release. Preventing such
\emph{diversity attacks} is the key challenge of providing MLT-PPDM services.
We address this challenge by properly correlating perturbation across copies at
different trust levels. We prove that our solution is robust against diversity
attacks with respect to our privacy goal. That is, for data miners who have
access to an arbitrary collection of the perturbed copies, our solution prevent
them from jointly reconstructing the original data more accurately than the
best effort using any individual copy in the collection. Our solution allows a
data owner to generate perturbed copies of its data for arbitrary trust levels
on-demand. This feature offers data owners maximum flexibility.Comment: 20 pages, 5 figures. Accepted for publication in IEEE Transactions on
Knowledge and Data Engineerin
Sharing Computer Network Logs for Security and Privacy: A Motivation for New Methodologies of Anonymization
Logs are one of the most fundamental resources to any security professional.
It is widely recognized by the government and industry that it is both
beneficial and desirable to share logs for the purpose of security research.
However, the sharing is not happening or not to the degree or magnitude that is
desired. Organizations are reluctant to share logs because of the risk of
exposing sensitive information to potential attackers. We believe this
reluctance remains high because current anonymization techniques are weak and
one-size-fits-all--or better put, one size tries to fit all. We must develop
standards and make anonymization available at varying levels, striking a
balance between privacy and utility. Organizations have different needs and
trust other organizations to different degrees. They must be able to map
multiple anonymization levels with defined risks to the trust levels they share
with (would-be) receivers. It is not until there are industry standards for
multiple levels of anonymization that we will be able to move forward and
achieve the goal of widespread sharing of logs for security researchers.Comment: 17 pages, 1 figur
- …