12,466 research outputs found
Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners
The k-nearest neighbors (k-NN) algorithm is a popular and effective
classification algorithm. Due to its large storage and computational
requirements, it is suitable for cloud outsourcing. However, k-NN is often run
on sensitive data such as medical records, user images, or personal
information. It is important to protect the privacy of data in an outsourced
k-NN system.
Prior works have all assumed the data owners (who submit data to the
outsourced k-NN system) are a single trusted party. However, we observe that in
many practical scenarios, there may be multiple mutually distrusting data
owners. In this work, we present the first framing and exploration of privacy
preservation in an outsourced k-NN system with multiple data owners. We
consider the various threat models introduced by this modification. We discover
that under a particularly practical threat model that covers numerous
scenarios, there exists a set of adaptive attacks that breach the data privacy
of any exact k-NN system. The vulnerability is a result of the mathematical
properties of k-NN and its output. Thus, we propose a privacy-preserving
alternative system supporting kernel density estimation using a Gaussian
kernel, a classification algorithm from the same family as k-NN. In many
applications, this similar algorithm serves as a good substitute for k-NN. We
additionally investigate solutions for other threat models, often through
extensions on prior single data owner systems
DeepSecure: Scalable Provably-Secure Deep Learning
This paper proposes DeepSecure, a novel framework that enables scalable
execution of the state-of-the-art Deep Learning (DL) models in a
privacy-preserving setting. DeepSecure targets scenarios in which neither of
the involved parties including the cloud servers that hold the DL model
parameters or the delegating clients who own the data is willing to reveal
their information. Our framework is the first to empower accurate and scalable
DL analysis of data generated by distributed clients without sacrificing the
security to maintain efficiency. The secure DL computation in DeepSecure is
performed using Yao's Garbled Circuit (GC) protocol. We devise GC-optimized
realization of various components used in DL. Our optimized implementation
achieves more than 58-fold higher throughput per sample compared with the
best-known prior solution. In addition to our optimized GC realization, we
introduce a set of novel low-overhead pre-processing techniques which further
reduce the GC overall runtime in the context of deep learning. Extensive
evaluations of various DL applications demonstrate up to two
orders-of-magnitude additional runtime improvement achieved as a result of our
pre-processing methodology. This paper also provides mechanisms to securely
delegate GC computations to a third party in constrained embedded settings
Privacy-Preserving and Outsourced Multi-User k-Means Clustering
Many techniques for privacy-preserving data mining (PPDM) have been
investigated over the past decade. Often, the entities involved in the data
mining process are end-users or organizations with limited computing and
storage resources. As a result, such entities may want to refrain from
participating in the PPDM process. To overcome this issue and to take many
other benefits of cloud computing, outsourcing PPDM tasks to the cloud
environment has recently gained special attention. We consider the scenario
where n entities outsource their databases (in encrypted format) to the cloud
and ask the cloud to perform the clustering task on their combined data in a
privacy-preserving manner. We term such a process as privacy-preserving and
outsourced distributed clustering (PPODC). In this paper, we propose a novel
and efficient solution to the PPODC problem based on k-means clustering
algorithm. The main novelty of our solution lies in avoiding the secure
division operations required in computing cluster centers altogether through an
efficient transformation technique. Our solution builds the clusters securely
in an iterative fashion and returns the final cluster centers to all entities
when a pre-determined termination condition holds. The proposed solution
protects data confidentiality of all the participating entities under the
standard semi-honest model. To the best of our knowledge, ours is the first
work to discuss and propose a comprehensive solution to the PPODC problem that
incurs negligible cost on the participating entities. We theoretically estimate
both the computation and communication costs of the proposed protocol and also
demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table
Privacy-Preserving Cloud-Assisted Data Analytics
Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users\u27 data may contain private information that needs to be protected.
Cloud computing has become more and more popular in both academia and industry communities. By pooling infrastructure and servers together, it can offer virtually unlimited resources easily accessible via the Internet. Various services could be provided by cloud platforms including machine learning and data analytics.
The goal of this dissertation is to develop privacy-preserving cloud-assisted data analytics solutions to address the aforementioned challenges, leveraging the powerful and easy-to-access cloud. In particular, we propose the following systems.
To address the problem of limited computation power at user and the need of privacy protection in data analytics, we consider geometric programming (GP) in data analytics, and design a secure, efficient, and verifiable outsourcing protocol for GP. Our protocol consists of a transform scheme that converts GP to DGP, a transform scheme with computationally indistinguishability, and an efficient scheme to solve the transformed DGP at the cloud side with result verification. Evaluation results show that the proposed secure outsourcing protocol can achieve significant time savings for users.
To address the problem of limited data at individual users, we propose two distributed learning systems such that users can collaboratively train machine learning models without losing privacy. The first one is a differentially private framework to train logistic regression models with distributed data sources. We employ the relevance between input data features and the model output to significantly improve the learning accuracy. Moreover, we adopt an evaluation data set at the cloud side to suppress low-quality data sources and propose a differentially private mechanism to protect user\u27s data quality privacy. Experimental results show that the proposed framework can achieve high utility with low quality data, and strong privacy guarantee.
The second one is an efficient privacy-preserving federated learning system that enables multiple edge users to collaboratively train their models without revealing dataset. To reduce the communication overhead, we select well-aligned and large-enough magnitude gradients for uploading which leads to quick convergence. To minimize the noise added and improve model utility, each user only adds a small amount of noise to his selected gradients, encrypts the noise gradients before uploading, and the cloud server will only get the aggregate gradients that contain enough noise to achieve differential privacy. Evaluation results show that the proposed system can achieve high accuracy, low communication overhead, and strong privacy guarantee.
In future work, we plan to design a privacy-preserving data analytics with fair exchange, which ensures the payment fairness. We will also consider designing distributed learning systems with heterogeneous architectures
The Potential for Machine Learning Analysis over Encrypted Data in Cloud-based Clinical Decision Support - Background and Review
This paper appeared at the 8th Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2015), Sydney, Australia, January 2015. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 164, Anthony Maeder and Jim Warren, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is includedIn an effort to reduce the risk of sensitive data exposure in untrusted networks such as the public cloud, increasing attention has recently been given to encryption schemes that allow specific computations to occur on encrypted data, without the need for decryption. This relies on the fact that some encryption algorithms display the property of homomorphism, which allows them to manipulate data in a meaningful way while still in encrypted form. Such a framework would find particular relevance in Clinical Decision Support (CDS) applications deployed in the public cloud. CDS applications have an important computational and analytical role over confidential healthcare information with the aim of supporting decision-making in clinical practice. This review paper examines the history and current status of homomoprhic encryption and its potential for preserving the privacy of patient data underpinning cloud-based CDS applications
k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data
Data Mining has wide applications in many areas such as banking, medicine,
scientific research and among government agencies. Classification is one of the
commonly used tasks in data mining applications. For the past decade, due to
the rise of various privacy issues, many theoretical and practical solutions to
the classification problem have been proposed under different security models.
However, with the recent popularity of cloud computing, users now have the
opportunity to outsource their data, in encrypted form, as well as the data
mining tasks to the cloud. Since the data on the cloud is in encrypted form,
existing privacy preserving classification techniques are not applicable. In
this paper, we focus on solving the classification problem over encrypted data.
In particular, we propose a secure k-NN classifier over encrypted data in the
cloud. The proposed k-NN protocol protects the confidentiality of the data,
user's input query, and data access patterns. To the best of our knowledge, our
work is the first to develop a secure k-NN classifier over encrypted data under
the semi-honest model. Also, we empirically analyze the efficiency of our
solution through various experiments.Comment: 29 pages, 2 figures, 3 tables arXiv admin note: substantial text
overlap with arXiv:1307.482
A Privacy-Preserving Outsourced Data Model in Cloud Environment
Nowadays, more and more machine learning applications, such as medical
diagnosis, online fraud detection, email spam filtering, etc., services are
provided by cloud computing. The cloud service provider collects the data from
the various owners to train or classify the machine learning system in the
cloud environment. However, multiple data owners may not entirely rely on the
cloud platform that a third party engages. Therefore, data security and privacy
problems are among the critical hindrances to using machine learning tools,
particularly with multiple data owners. In addition, unauthorized entities can
detect the statistical input data and infer the machine learning model
parameters. Therefore, a privacy-preserving model is proposed, which protects
the privacy of the data without compromising machine learning efficiency. In
order to protect the data of data owners, the epsilon-differential privacy is
used, and fog nodes are used to address the problem of the lower bandwidth and
latency in this proposed scheme. The noise is produced by the
epsilon-differential mechanism, which is then added to the data. Moreover, the
noise is injected at the data owner site to protect the owners data. Fog nodes
collect the noise-added data from the data owners, then shift it to the cloud
platform for storage, computation, and performing the classification tasks
purposes
- …