11 research outputs found

    Private Collaborative Neural Network Learning

    Get PDF
    Machine learning algorithms, such as neural networks, create better predictive models when having access to larger datasets. In many domains, such as medicine and finance, each institute has only access to limited amounts of data, and creating larger datasets typically requires collaboration. However, there are privacy related constraints on these collaborations for legal, ethical, and competitive reasons. In this work, we present a feasible protocol for learning neural networks in a collaborative way while preserving the privacy of each record. This is achieved by combining Differential Privacy and Secure Multi-Party Computation with Machine Learning

    A Hybrid Approach to Privacy-Preserving Federated Learning

    Full text link
    Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions

    TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service

    Get PDF
    Machine learning methods are widely used for a variety of prediction problems. \emph{Prediction as a service} is a paradigm in which service providers with technological expertise and computational resources may perform predictions for clients. However, data privacy severely restricts the applicability of such services, unless measures to keep client data private (even from the service provider) are designed. Equally important is to minimize the amount of computation and communication required between client and server. Fully homomorphic encryption offers a possible way out, whereby clients may encrypt their data, and on which the server may perform arithmetic computations. The main drawback of using fully homomorphic encryption is the amount of time required to evaluate large machine learning models on encrypted data. We combine ideas from the machine learning literature, particularly work on binarization and sparsification of neural networks, together with algorithmic tools to speed-up and parallelize computation using encrypted data.Comment: Accepted at International Conference in Machine Learning (ICML), 201

    SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

    Full text link
    Nowadays, gathering high-quality training data from multiple data controllers with privacy preservation is a key challenge to train high-quality machine learning models. The potential solutions could dramatically break the barriers among isolated data corpus, and consequently enlarge the range of data available for processing. To this end, both academia researchers and industrial vendors are recently strongly motivated to propose two main-stream folders of solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated Learning (FL for short). These two solutions have their advantages and limitations when we evaluate them from privacy preservation, ways of communication, communication overhead, format of data, the accuracy of trained models, and application scenarios. Motivated to demonstrate the research progress and discuss the insights on the future directions, we thoroughly investigate these protocols and frameworks of both MPL and FL. At first, we define the problem of training machine learning models over multiple data sources with privacy-preserving (TMMPP for short). Then, we compare the recent studies of TMMPP from the aspects of the technical routes, parties supported, data partitioning, threat model, and supported machine learning models, to show the advantages and limitations. Next, we introduce the state-of-the-art platforms which support online training over multiple data sources. Finally, we discuss the potential directions to resolve the problem of TMMPP.Comment: 17 pages, 4 figure

    Derin Öğrenme Modellerinde Mahremiyet ve Güvenlik Üzerine Bir Derleme Çalışması

    Get PDF
    Son dönemlerde derin öğrenmedeki devrim niteliğindeki gelişmeler ile birlikte yapay zekaya yönelik beklentiler gün geçtikçe artmaktadır. Konuşma tanıma, doğal dil işleme (NLP), görüntü işleme gibi birçok alanda etkin bir şekilde uygulanabilen bir araştırma alanı olan derin öğrenme klasik makine öğrenmesi ile karşılaştırıldığında daha yüksek başarı göstermektedir. Derin öğrenme ile geliştirilen modellerde eğitim ve tahminleme sırasında büyük miktarda veri kullanılmakta ve kullanılan veriler kişisel verilerden oluşabilmektedir. Bu verilerin işlenmesi sırasında kişisel verilerin korunması kanununa (KVKK) aykırı olmaması oldukça önemlidir. Bu nedenle verilerin gizliliği ve güvenliğinin sağlanması oldukça önemli bir husustur. Bu çalışmada, derin öğrenme modelleri geliştirilirken yaygın kullanılan mimariler verilmiştir. Verilerin gizliliği ve güvenliğini artırmak için literatürde yaygın olarak karşılaşılan güvenli çok partili hesaplama, diferansiyel mahremiyet, garbled devre protokolü ve homomorfik şifreleme araçları özetlenmiştir. Çeşitli sistem tasarımlarında kullanılan bu araçların yer aldığı güncel çalışmalar taranmıştır. Bu çalışmalar, derin öğrenme modelinin eğitim ve tahminleme aşamasında olmak üzere iki kategoride incelenmiştir. Literatürdeki çeşitli modeller üzerinde uygulanabilen güncel saldırılar ve bu saldırılardan korunmak amacıyla geliştirilen yöntemler verilmiştir. Ayrıca, güncel araştırma alanları belirlenmiştir. Buna göre, gelecekteki araştırma yönü kriptografik temelli yöntemlerin karmaşıklığının azaltılması ve geliştirilen modelin güvenilirliğini belirlemek için çeşitli ölçme ve değerlendirme yöntemlerinin geliştirilmesi yönünde olabilir

    Secure Noise Sampling for DP in MPC with Finite Precision

    Get PDF
    Using secure multi-party computation (MPC) to generate noise and add this noise to a function output allows individuals to achieve formal differential privacy (DP) guarantees without needing to trust any third party or sacrifice the utility of the output. However, securely generating and adding this noise is a challenge considering real-world implementations on finite-precision computers, since many DP mechanisms guarantee privacy only when noise is sampled from continuous distributions requiring infinite precision. We introduce efficient MPC protocols that securely realize noise sampling for several plaintext DP mechanisms that are secure against existing precision-based attacks: the discrete Laplace and Gaussian mechanisms, the snapping mechanism, and the integer-scaling Laplace and Gaussian mechanisms. Due to their inherent trade-offs, the favorable mechanism for a specific application depends on the available computation resources, type of function evaluated, and desired (epsilon,delta)-DP guarantee. The benchmarks of our protocols implemented in the state-of-the-art MPC framework MOTION (Braun et al., TOPS\u2722) demonstrate highly efficient online runtimes of less than 32 ms/query and down to about 1ms/query with batching in the two-party setting. Also the respective offline phases are practical, requiring only 51 ms to 5.6 seconds/query depending on the batch size
    corecore