139 research outputs found
Privacy Preserving Inference for Deep Neural Networks:Optimizing Homomorphic Encryption for Efficient and Secure Classification
The application of machine learning in healthcare, financial, social media, and other sensitive sectors not only involves high accuracy but privacy as well. Due to the emergence of the Cloud as a computation and one-to-many access paradigm; training and classification/inference tasks have been outsourced to Cloud. However, its usage is limited due to legal and ethical constraints regarding privacy. In this work, we propose a privacy-preserving neural networks-based classification model based on Homomorphic Encryption (HE) where the user can send an encrypted instance to the cloud and receive an encrypted inference from it to preserve the user’s query privacy. In contrast to existing works, we demonstrate the realistic limitations of HE for privacy-preserving machine learning by changing its parameters for enhanced security and accuracy. We showcase scenarios where the choice of HE parameters impedes accurate classification and present an optimized setting for achieving reliable classification. We present several results to demonstrate its effectiveness using MNIST dataset with highly improved inference time for a query as compared to the state of the art
Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications
We present Chameleon, a novel hybrid (mixed-protocol) framework for secure
function evaluation (SFE) which enables two parties to jointly compute a
function without disclosing their private inputs. Chameleon combines the best
aspects of generic SFE protocols with the ones that are based upon additive
secret sharing. In particular, the framework performs linear operations in the
ring using additively secret shared values and nonlinear
operations using Yao's Garbled Circuits or the Goldreich-Micali-Wigderson
protocol. Chameleon departs from the common assumption of additive or linear
secret sharing models where three or more parties need to communicate in the
online phase: the framework allows two parties with private inputs to
communicate in the online phase under the assumption of a third node generating
correlated randomness in an offline phase. Almost all of the heavy
cryptographic operations are precomputed in an offline phase which
substantially reduces the communication overhead. Chameleon is both scalable
and significantly more efficient than the ABY framework (NDSS'15) it is based
on. Our framework supports signed fixed-point numbers. In particular,
Chameleon's vector dot product of signed fixed-point numbers improves the
efficiency of mining and classification of encrypted data for algorithms based
upon heavy matrix multiplications. Our evaluation of Chameleon on a 5 layer
convolutional deep neural network shows 133x and 4.2x faster executions than
Microsoft CryptoNets (ICML'16) and MiniONN (CCS'17), respectively
A Generative Framework for Low-Cost Result Validation of Outsourced Machine Learning Tasks
The growing popularity of Machine Learning (ML) has led to its deployment in
various sensitive domains, which has resulted in significant research focused
on ML security and privacy. However, in some applications, such as autonomous
driving, integrity verification of the outsourced ML workload is more
critical--a facet that has not received much attention. Existing solutions,
such as multi-party computation and proof-based systems, impose significant
computation overhead, which makes them unfit for real-time applications. We
propose Fides, a novel framework for real-time validation of outsourced ML
workloads. Fides features a novel and efficient distillation technique--Greedy
Distillation Transfer Learning--that dynamically distills and fine-tunes a
space and compute-efficient verification model for verifying the corresponding
service model while running inside a trusted execution environment. Fides
features a client-side attack detection model that uses statistical analysis
and divergence measurements to identify, with a high likelihood, if the service
model is under attack. Fides also offers a re-classification functionality that
predicts the original class whenever an attack is identified. We devised a
generative adversarial network framework for training the attack detection and
re-classification models. The evaluation shows that Fides achieves an accuracy
of up to 98% for attack detection and 94% for re-classification.Comment: 16 pages, 11 figure
Zero-knowledge Proof Meets Machine Learning in Verifiability: A Survey
With the rapid advancement of artificial intelligence technology, the usage
of machine learning models is gradually becoming part of our daily lives.
High-quality models rely not only on efficient optimization algorithms but also
on the training and learning processes built upon vast amounts of data and
computational power. However, in practice, due to various challenges such as
limited computational resources and data privacy concerns, users in need of
models often cannot train machine learning models locally. This has led them to
explore alternative approaches such as outsourced learning and federated
learning. While these methods address the feasibility of model training
effectively, they introduce concerns about the trustworthiness of the training
process since computations are not performed locally. Similarly, there are
trustworthiness issues associated with outsourced model inference. These two
problems can be summarized as the trustworthiness problem of model
computations: How can one verify that the results computed by other
participants are derived according to the specified algorithm, model, and input
data? To address this challenge, verifiable machine learning (VML) has emerged.
This paper presents a comprehensive survey of zero-knowledge proof-based
verifiable machine learning (ZKP-VML) technology. We first analyze the
potential verifiability issues that may exist in different machine learning
scenarios. Subsequently, we provide a formal definition of ZKP-VML. We then
conduct a detailed analysis and classification of existing works based on their
technical approaches. Finally, we discuss the key challenges and future
directions in the field of ZKP-based VML
The High-Level Practical Overview of Open-Source Privacy-Preserving Machine Learning Solutions
This paper aims to provide a high-level overview of practical approaches to machine-learning respecting the privacy and confidentiality of customer information, which is called Privacy-Preserving Machine Learning. First, the security approaches in offline-learning privacy methods are assessed. Those focused on modern cryptographic methods, such as Homomorphic Encryption and Secure Multi-Party Computation, as well as on dedicated combined hardware and software platforms like Trusted Execution Environment - Intel® Software Guard Extensions (Intel® SGX). Combining the security approaches with different machine learning architectures leads to our Proof of Concept in which the accuracy and speed of the security solutions will be examined. The next step was exploring and comparing the Open-Source Python-based solutions for PPML. Four solutions were selected from almost 40 separate, state-of-the-art systems: SyMPC, TF-Encrypted, TenSEAL, and Gramine. Three different Neural Network architectures were designed to show different libraries’ capabilities. The POC solves the image classification problem based on the MNIST dataset. As the computational results show, the accuracy of all considered secure approaches is similar. The maximum difference between non-secure and secure flow does not exceed 1.2%. In terms of secure computations, the most effective Privacy-Preserving Machine Learning library is based on Trusted Execution Environment, followed by Secure Multi-Party Computation and Homomorphic Encryption. However, most of those are at least 1000 times slower than the non-secure evaluation. Unfortunately, it is not acceptable for a real-world scenario. Future work could combine different security approaches, explore other new and existing state-of-the-art libraries or implement support for hardware-accelerated secure computation
Joint Linear and Nonlinear Computation with Data Encryption for Efficient Privacy-Preserving Deep Learning
Deep Learning (DL) has shown unrivalled performance in many applications such as image classification, speech recognition, anomalous detection, and business analytics. While end users and enterprises own enormous data, DL talents and computing power are mostly gathered in technology giants having cloud servers. Thus, data owners, i.e., the clients, are motivated to outsource their data, along with computationally-intensive tasks, to the server in order to leverage the server’s abundant computation resources and DL talents for developing cost-effective DL solutions. However, trust is required between the server and the client to finish the computation tasks (e.g., conducting inference for the newly-input data from the client, based on a well-trained model at the server) otherwise there could be the data breach (e.g., leaking data from the client or the proprietary model parameters from the server). Privacy-preserving DL takes data privacy into account where various data-encryption based techniques are adopted. However, the efficiency of linear and nonlinear computation for each DL layer remains a fundamental challenge in practice due to the intrinsic intractability and complexity of privacy-preserving primitives (e.g., Homomorphic Encryption (HE) and Garbled Circuits (GC)). As such, this dissertation targets deeply optimizing state-of-the-art frameworks as well as newly designing efficient modules by joint linear and nonlinear computation, with data encryption, to further boost the overall performance of privacy-preserving DL. Four contributions are made
Paralinguistic Privacy Protection at the Edge
Voice user interfaces and digital assistants are rapidly entering our lives
and becoming singular touch points spanning our devices. These always-on
services capture and transmit our audio data to powerful cloud services for
further processing and subsequent actions. Our voices and raw audio signals
collected through these devices contain a host of sensitive paralinguistic
information that is transmitted to service providers regardless of deliberate
or false triggers. As our emotional patterns and sensitive attributes like our
identity, gender, mental well-being, are easily inferred using deep acoustic
models, we encounter a new generation of privacy risks by using these services.
One approach to mitigate the risk of paralinguistic-based privacy breaches is
to exploit a combination of cloud-based processing with privacy-preserving,
on-device paralinguistic information learning and filtering before transmitting
voice data. In this paper we introduce EDGY, a configurable, lightweight,
disentangled representation learning framework that transforms and filters
high-dimensional voice data to identify and contain sensitive attributes at the
edge prior to offloading to the cloud. We evaluate EDGY's on-device performance
and explore optimization techniques, including model quantization and knowledge
distillation, to enable private, accurate and efficient representation learning
on resource-constrained devices. Our results show that EDGY runs in tens of
milliseconds with 0.2% relative improvement in ABX score or minimal performance
penalties in learning linguistic representations from raw voice signals, using
a CPU and a single-core ARM processor without specialized hardware.Comment: 14 pages, 7 figures. arXiv admin note: text overlap with
arXiv:2007.1506
- …