2,457 research outputs found
A Privacy-Preserving Framework for Collaborative Association Rule Mining in Cloud
Collaborative Data Mining facilitates multiple organizations to integrate their datasets and extract useful knowledge from their joint datasets for mutual benefits. The knowledge extracted in this manner is found to be superior to the knowledge extracted locally from a single organization’s dataset. With the rapid development of outsourcing, there is a growing interest for organizations to outsource their data mining tasks to a cloud environment to effectively address their economic and performance demands. However, due to privacy concerns and stringent compliance regulations, organizations do not want to share their private datasets neither with the cloud nor with other participating organizations. In this paper, we address the problem of outsourcing association rule mining task to a federated cloud environment in a privacy-preserving manner. Specifically, we propose a privacy-preserving framework that allows a set of users, each with a private dataset, to outsource their encrypted databases and the cloud returns the association rules extracted from the aggregated encrypted databases to the participating users. Our proposed solution ensures the confidentiality of the outsourced data and also minimizes the users’ participation during the association rule mining process. Additionally, we show that the proposed solution is secure under the standard semi-honest model and demonstrate its practicality
Privacy-Preserving and Outsourced Multi-User k-Means Clustering
Many techniques for privacy-preserving data mining (PPDM) have been
investigated over the past decade. Often, the entities involved in the data
mining process are end-users or organizations with limited computing and
storage resources. As a result, such entities may want to refrain from
participating in the PPDM process. To overcome this issue and to take many
other benefits of cloud computing, outsourcing PPDM tasks to the cloud
environment has recently gained special attention. We consider the scenario
where n entities outsource their databases (in encrypted format) to the cloud
and ask the cloud to perform the clustering task on their combined data in a
privacy-preserving manner. We term such a process as privacy-preserving and
outsourced distributed clustering (PPODC). In this paper, we propose a novel
and efficient solution to the PPODC problem based on k-means clustering
algorithm. The main novelty of our solution lies in avoiding the secure
division operations required in computing cluster centers altogether through an
efficient transformation technique. Our solution builds the clusters securely
in an iterative fashion and returns the final cluster centers to all entities
when a pre-determined termination condition holds. The proposed solution
protects data confidentiality of all the participating entities under the
standard semi-honest model. To the best of our knowledge, ours is the first
work to discuss and propose a comprehensive solution to the PPODC problem that
incurs negligible cost on the participating entities. We theoretically estimate
both the computation and communication costs of the proposed protocol and also
demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table
Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation
With the wide deployment of public cloud computing infrastructures, using
clouds to host data query services has become an appealing solution for the
advantages on scalability and cost-saving. However, some data might be
sensitive that the data owner does not want to move to the cloud unless the
data confidentiality and query privacy are guaranteed. On the other hand, a
secured query service should still provide efficient query processing and
significantly reduce the in-house workload to fully realize the benefits of
cloud computing. We propose the RASP data perturbation method to provide secure
and efficient range query and kNN query services for protected data in the
cloud. The RASP data perturbation method combines order preserving encryption,
dimensionality expansion, random noise injection, and random projection, to
provide strong resilience to attacks on the perturbed data and queries. It also
preserves multidimensional ranges, which allows existing indexing techniques to
be applied to speedup range query processing. The kNN-R algorithm is designed
to work with the RASP range query algorithm to process the kNN queries. We have
carefully analyzed the attacks on data and queries under a precisely defined
threat model and realistic security assumptions. Extensive experiments have
been conducted to show the advantages of this approach on efficiency and
security.Comment: 18 pages, to appear in IEEE TKDE, accepted in December 201
Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners
The k-nearest neighbors (k-NN) algorithm is a popular and effective
classification algorithm. Due to its large storage and computational
requirements, it is suitable for cloud outsourcing. However, k-NN is often run
on sensitive data such as medical records, user images, or personal
information. It is important to protect the privacy of data in an outsourced
k-NN system.
Prior works have all assumed the data owners (who submit data to the
outsourced k-NN system) are a single trusted party. However, we observe that in
many practical scenarios, there may be multiple mutually distrusting data
owners. In this work, we present the first framing and exploration of privacy
preservation in an outsourced k-NN system with multiple data owners. We
consider the various threat models introduced by this modification. We discover
that under a particularly practical threat model that covers numerous
scenarios, there exists a set of adaptive attacks that breach the data privacy
of any exact k-NN system. The vulnerability is a result of the mathematical
properties of k-NN and its output. Thus, we propose a privacy-preserving
alternative system supporting kernel density estimation using a Gaussian
kernel, a classification algorithm from the same family as k-NN. In many
applications, this similar algorithm serves as a good substitute for k-NN. We
additionally investigate solutions for other threat models, often through
extensions on prior single data owner systems
k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data
Data Mining has wide applications in many areas such as banking, medicine,
scientific research and among government agencies. Classification is one of the
commonly used tasks in data mining applications. For the past decade, due to
the rise of various privacy issues, many theoretical and practical solutions to
the classification problem have been proposed under different security models.
However, with the recent popularity of cloud computing, users now have the
opportunity to outsource their data, in encrypted form, as well as the data
mining tasks to the cloud. Since the data on the cloud is in encrypted form,
existing privacy preserving classification techniques are not applicable. In
this paper, we focus on solving the classification problem over encrypted data.
In particular, we propose a secure k-NN classifier over encrypted data in the
cloud. The proposed k-NN protocol protects the confidentiality of the data,
user's input query, and data access patterns. To the best of our knowledge, our
work is the first to develop a secure k-NN classifier over encrypted data under
the semi-honest model. Also, we empirically analyze the efficiency of our
solution through various experiments.Comment: 29 pages, 2 figures, 3 tables arXiv admin note: substantial text
overlap with arXiv:1307.482
Preserving Both Privacy and Utility in Network Trace Anonymization
As network security monitoring grows more sophisticated, there is an
increasing need for outsourcing such tasks to third-party analysts. However,
organizations are usually reluctant to share their network traces due to
privacy concerns over sensitive information, e.g., network and system
configuration, which may potentially be exploited for attacks. In cases where
data owners are convinced to share their network traces, the data are typically
subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces
real IP addresses with prefix-preserving pseudonyms. However, most such
techniques either are vulnerable to adversaries with prior knowledge about some
network flows in the traces, or require heavy data sanitization or
perturbation, both of which may result in a significant loss of data utility.
In this paper, we aim to preserve both privacy and utility through shifting the
trade-off from between privacy and utility to between privacy and computational
cost. The key idea is for the analysts to generate and analyze multiple
anonymized views of the original network traces; those views are designed to be
sufficiently indistinguishable even to adversaries armed with prior knowledge,
which preserves the privacy, whereas one of the views will yield true analysis
results privately retrieved by the data owner, which preserves the utility. We
present the general approach and instantiate it based on CryptoPAn. We formally
analyze the privacy of our solution and experimentally evaluate it using real
network traces provided by a major ISP. The results show that our approach can
significantly reduce the level of information leakage (e.g., less than 1\% of
the information leaked by CryptoPAn) with comparable utility
A Review on: Association Rule Mining Using Privacy for Partitioned Database
Data Analysis techniques that are Association manage mining and Frequent thing set mining are two prominent and broadly utilized for different applications. The conventional framework concentrated independently on vertically parceled database and on a level plane apportioned databases on the premise of this presenting a framework which concentrate on both on a level plane and vertically divided databases cooperatively with protection safeguarding component. Information proprietors need to know the continuous thing sets or affiliation rules from an aggregate information set and unveil or uncover as few data about their crude information as could reasonably be expected to other information proprietors and outsiders. To guarantee information protection a Symmetric Encryption Technique is utilized to show signs of improvement result. Cloud supported successive thing set mining arrangement used to exhibit an affiliation govern mining arrangement. The subsequent arrangements are intended for outsourced databases that permit various information proprietors to proficiently share their information safely without trading off on information protection. Information security is one of the key procedures in outsourcing information to different outside clients. Customarily Fast Distribution Mining calculation was proposed for securing conveyed information. These business locales an issue by secure affiliation governs over parceled information in both even and vertical. A Frequent thing sets calculation and Distributed affiliation administer digging calculation is used for doing above method adequately in divided information, which incorporates administrations of the information in outsourcing process for disseminated databases. This work keeps up or keeps up proficient security over vertical and flat perspective of representation in secure mining applications
Privacy-preserving targeted advertising scheme for IPTV using the cloud
In this paper, we present a privacy-preserving scheme for targeted advertising via the Internet Protocol TV (IPTV). The scheme uses a communication model involving a collection of viewers/subscribers, a content provider (IPTV), an advertiser, and a cloud server. To provide high quality directed advertising service, the advertiser can utilize not only demographic information of subscribers, but also their watching habits. The latter includes watching history, preferences for IPTV content and watching rate, which are published on the cloud server periodically (e.g. weekly) along with anonymized demographics. Since the published data may leak sensitive information about subscribers, it is safeguarded using cryptographic techniques in addition to the anonymization of demographics. The techniques used by the advertiser, which can be manifested in its queries to the cloud, are considered (trade) secrets and therefore are protected as well. The cloud is oblivious to the published data, the queries of the advertiser as well as its own responses to these queries. Only a legitimate advertiser, endorsed with a so-called {\em trapdoor} by the IPTV, can query the cloud and utilize the query results. The performance of the proposed scheme is evaluated with experiments, which show that the scheme is suitable for practical usage
- …