789 research outputs found
Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data
The k-means clustering is one of the most popular clustering algorithms in
data mining. Recently a lot of research has been concentrated on the algorithm
when the dataset is divided into multiple parties or when the dataset is too
large to be handled by the data owner. In the latter case, usually some servers
are hired to perform the task of clustering. The dataset is divided by the data
owner among the servers who together perform the k-means and return the cluster
labels to the owner. The major challenge in this method is to prevent the
servers from gaining substantial information about the actual data of the
owner. Several algorithms have been designed in the past that provide
cryptographic solutions to perform privacy preserving k-means. We provide a new
method to perform k-means over a large set using multiple servers. Our
technique avoids heavy cryptographic computations and instead we use a simple
randomization technique to preserve the privacy of the data. The k-means
computed has exactly the same efficiency and accuracy as the k-means computed
over the original dataset without any randomization. We argue that our
algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems
Security. Springer, Cham, 201
Protecting privacy of users in brain-computer interface applications
Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG signals, including passwords, ATM PINs, and other intimate data. The challenge we address is how to engage in meaningful ML with EEG data while protecting the privacy of users. Hence, we propose cryptographic protocols based on secure multiparty computation (SMC) to perform linear regression over EEG signals from many users in a fully privacy-preserving(PP) fashion, i.e., such that each individual's EEG signals are not revealed to anyone else. To illustrate the potential of our secure framework, we show how it allows estimating the drowsiness of drivers from their EEG signals as would be possible in the unencrypted case, and at a very reasonable computational cost. Our solution is the first application of commodity-based SMC to EEG data, as well as the largest documented experiment of secret sharing-based SMC in general, namely, with 15 players involved in all the computations
Functional encryption based approaches for practical privacy-preserving machine learning
Machine learning (ML) is increasingly being used in a wide variety of application domains. However, deploying ML solutions poses a significant challenge because of increasing privacy concerns, and requirements imposed by privacy-related regulations. To tackle serious privacy concerns in ML-based applications, significant recent research efforts have focused on developing privacy-preserving ML (PPML) approaches by integrating into ML pipeline existing anonymization mechanisms or emerging privacy protection approaches such as differential privacy, secure computation, and other architectural frameworks. While promising, existing secure computation based approaches, however, have significant computational efficiency issues and hence, are not practical.
In this dissertation, we address several challenges related to PPML and propose practical secure computation based approaches to solve them. We consider both two-tier cloud-based and three-tier hybrid cloud-edge based PPML architectures and address both emerging deep learning models and federated learning approaches. The proposed approaches enable us to outsource data or update a locally trained model in a privacy-preserving manner by employing computation over encrypted datasets or local models. Our proposed secure computation solutions are based on functional encryption (FE) techniques. Evaluation of the proposed approaches shows that they are efficient and more practical than existing approaches, and provide strong privacy guarantees. We also address issues related to the trustworthiness of various entities within the proposed PPML infrastructures. This includes a third-party authority (TPA) which plays a critical role in the proposed FE-based PPML solutions, and cloud service providers. To ensure that such entities can be trusted, we propose a transparency and accountability framework using blockchain. We show that the proposed transparency framework is effective and guarantees security properties. Experimental evaluation shows that the proposed framework is efficient
FedSL: Federated Split Learning on Distributed Sequential Data in Recurrent Neural Networks
Federated Learning (FL) and Split Learning (SL) are privacy-preserving
Machine-Learning (ML) techniques that enable training ML models over data
distributed among clients without requiring direct access to their raw data.
Existing FL and SL approaches work on horizontally or vertically partitioned
data and cannot handle sequentially partitioned data where segments of
multiple-segment sequential data are distributed across clients. In this paper,
we propose a novel federated split learning framework, FedSL, to train models
on distributed sequential data. The most common ML models to train on
sequential data are Recurrent Neural Networks (RNNs). Since the proposed
framework is privacy preserving, segments of multiple-segment sequential data
cannot be shared between clients or between clients and server. To circumvent
this limitation, we propose a novel SL approach tailored for RNNs. A RNN is
split into sub-networks, and each sub-network is trained on one client
containing single segments of multiple-segment training sequences. During local
training, the sub-networks on different clients communicate with each other to
capture latent dependencies between consecutive segments of multiple-segment
sequential data on different clients, but without sharing raw data or complete
model parameters. After training local sub-networks with local sequential data
segments, all clients send their sub-networks to a federated server where
sub-networks are aggregated to generate a global model. The experimental
results on simulated and real-world datasets demonstrate that the proposed
method successfully train models on distributed sequential data, while
preserving privacy, and outperforms previous FL and centralized learning
approaches in terms of achieving higher accuracy in fewer communication rounds
- …