11 research outputs found
Private Collaborative Neural Network Learning
Machine learning algorithms, such as neural networks, create better predictive models when having access to larger datasets. In many domains, such as medicine and finance, each institute has only access to limited amounts of data, and creating larger datasets typically requires collaboration. However, there are privacy related constraints on these collaborations for legal, ethical, and competitive reasons. In this work, we present a feasible protocol for learning neural networks in a collaborative way while preserving the privacy of each record. This is achieved by combining Differential Privacy and Secure Multi-Party Computation with Machine Learning
A Hybrid Approach to Privacy-Preserving Federated Learning
Federated learning facilitates the collaborative training of models without
the sharing of raw data. However, recent attacks demonstrate that simply
maintaining data locality during training processes does not provide sufficient
privacy guarantees. Rather, we need a federated learning system capable of
preventing inference over both the messages exchanged during training and the
final trained model while ensuring the resulting model also has acceptable
predictive accuracy. Existing federated learning approaches either use secure
multiparty computation (SMC) which is vulnerable to inference or differential
privacy which can lead to low accuracy given a large number of parties with
relatively small amounts of data each. In this paper, we present an alternative
approach that utilizes both differential privacy and SMC to balance these
trade-offs. Combining differential privacy with secure multiparty computation
enables us to reduce the growth of noise injection as the number of parties
increases without sacrificing privacy while maintaining a pre-defined rate of
trust. Our system is therefore a scalable approach that protects against
inference threats and produces models with high accuracy. Additionally, our
system can be used to train a variety of machine learning models, which we
validate with experimental results on 3 different machine learning algorithms.
Our experiments demonstrate that our approach out-performs state of the art
solutions
TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service
Machine learning methods are widely used for a variety of prediction
problems. \emph{Prediction as a service} is a paradigm in which service
providers with technological expertise and computational resources may perform
predictions for clients. However, data privacy severely restricts the
applicability of such services, unless measures to keep client data private
(even from the service provider) are designed. Equally important is to minimize
the amount of computation and communication required between client and server.
Fully homomorphic encryption offers a possible way out, whereby clients may
encrypt their data, and on which the server may perform arithmetic
computations. The main drawback of using fully homomorphic encryption is the
amount of time required to evaluate large machine learning models on encrypted
data. We combine ideas from the machine learning literature, particularly work
on binarization and sparsification of neural networks, together with
algorithmic tools to speed-up and parallelize computation using encrypted data.Comment: Accepted at International Conference in Machine Learning (ICML), 201
SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation
Nowadays, gathering high-quality training data from multiple data controllers
with privacy preservation is a key challenge to train high-quality machine
learning models. The potential solutions could dramatically break the barriers
among isolated data corpus, and consequently enlarge the range of data
available for processing. To this end, both academia researchers and industrial
vendors are recently strongly motivated to propose two main-stream folders of
solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated
Learning (FL for short). These two solutions have their advantages and
limitations when we evaluate them from privacy preservation, ways of
communication, communication overhead, format of data, the accuracy of trained
models, and application scenarios.
Motivated to demonstrate the research progress and discuss the insights on
the future directions, we thoroughly investigate these protocols and frameworks
of both MPL and FL. At first, we define the problem of training machine
learning models over multiple data sources with privacy-preserving (TMMPP for
short). Then, we compare the recent studies of TMMPP from the aspects of the
technical routes, parties supported, data partitioning, threat model, and
supported machine learning models, to show the advantages and limitations.
Next, we introduce the state-of-the-art platforms which support online training
over multiple data sources. Finally, we discuss the potential directions to
resolve the problem of TMMPP.Comment: 17 pages, 4 figure
Derin Öğrenme Modellerinde Mahremiyet ve Güvenlik Üzerine Bir Derleme Çalışması
Son dönemlerde derin öğrenmedeki devrim niteliğindeki gelişmeler ile birlikte yapay zekaya yönelik beklentiler gün geçtikçe artmaktadır. Konuşma tanıma, doğal dil işleme (NLP), görüntü işleme gibi birçok alanda etkin bir şekilde uygulanabilen bir araştırma alanı olan derin öğrenme klasik makine öğrenmesi ile karşılaştırıldığında daha yüksek başarı göstermektedir. Derin öğrenme ile geliştirilen modellerde eğitim ve tahminleme sırasında büyük miktarda veri kullanılmakta ve kullanılan veriler kişisel verilerden oluşabilmektedir. Bu verilerin işlenmesi sırasında kişisel verilerin korunması kanununa (KVKK) aykırı olmaması oldukça önemlidir. Bu nedenle verilerin gizliliği ve güvenliğinin sağlanması oldukça önemli bir husustur. Bu çalışmada, derin öğrenme modelleri geliştirilirken yaygın kullanılan mimariler verilmiştir. Verilerin gizliliği ve güvenliğini artırmak için literatürde yaygın olarak karşılaşılan güvenli çok partili hesaplama, diferansiyel mahremiyet, garbled devre protokolü ve homomorfik şifreleme araçları özetlenmiştir. Çeşitli sistem tasarımlarında kullanılan bu araçların yer aldığı güncel çalışmalar taranmıştır. Bu çalışmalar, derin öğrenme modelinin eğitim ve tahminleme aşamasında olmak üzere iki kategoride incelenmiştir. Literatürdeki çeşitli modeller üzerinde uygulanabilen güncel saldırılar ve bu saldırılardan korunmak amacıyla geliştirilen yöntemler verilmiştir. Ayrıca, güncel araştırma alanları belirlenmiştir. Buna göre, gelecekteki araştırma yönü kriptografik temelli yöntemlerin karmaşıklığının azaltılması ve geliştirilen modelin güvenilirliğini belirlemek için çeşitli ölçme ve değerlendirme yöntemlerinin geliştirilmesi yönünde olabilir
Secure Noise Sampling for DP in MPC with Finite Precision
Using secure multi-party computation (MPC) to generate noise and add this noise to a function output allows individuals to achieve formal differential privacy (DP) guarantees without needing to trust any third party or sacrifice the utility of the output. However, securely generating and adding this noise is a challenge considering real-world implementations on finite-precision computers, since many DP mechanisms guarantee privacy only when noise is sampled from continuous distributions requiring infinite precision.
We introduce efficient MPC protocols that securely realize noise sampling for several plaintext DP mechanisms that are secure against existing precision-based attacks: the discrete Laplace and Gaussian mechanisms, the snapping mechanism, and the integer-scaling Laplace and Gaussian mechanisms. Due to their inherent trade-offs, the favorable mechanism for a specific application depends on the available computation resources, type of function evaluated, and desired (epsilon,delta)-DP guarantee.
The benchmarks of our protocols implemented in the state-of-the-art MPC framework MOTION (Braun et al., TOPS\u2722) demonstrate highly efficient online runtimes of less than 32 ms/query and down to about 1ms/query with batching in the two-party setting. Also the respective offline phases are practical, requiring only 51 ms to 5.6 seconds/query depending on the batch size