66 research outputs found
GPU-based Private Information Retrieval for On-Device Machine Learning Inference
On-device machine learning (ML) inference can enable the use of private user
data on user devices without revealing them to remote servers. However, a pure
on-device solution to private ML inference is impractical for many applications
that rely on embedding tables that are too large to be stored on-device. In
particular, recommendation models typically use multiple embedding tables each
on the order of 1-10 GBs of data, making them impractical to store on-device.
To overcome this barrier, we propose the use of private information retrieval
(PIR) to efficiently and privately retrieve embeddings from servers without
sharing any private information. As off-the-shelf PIR algorithms are usually
too computationally intensive to directly use for latency-sensitive inference
tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR
with the downstream ML application to obtain further speedup. Our GPU
acceleration strategy improves system throughput by more than over
an optimized CPU PIR implementation, and our PIR-ML co-design provides an over
additional throughput improvement at fixed model quality. Together,
for various on-device ML applications such as recommendation and language
modeling, our system on a single V100 GPU can serve up to queries per
second -- a throughput improvement over a CPU-based baseline --
while maintaining model accuracy
Recommended from our members
FlexFHE: A System for Homomorphically Encrypting DNA and Operating on Encrypted Data Securely in Untrusted Environments
DNA data contains sensitive health information and personally identifiable data. Currently, even if DNA data is stored in encrypted databases, it must be decrypted for health professionals and researchers to analyze, which means that DNA data exists in plaintext on unsecured, untrusted servers and machines during analysis. This thesis describes a complete system for homomorphically encrypting DNA data in a trusted context and then running analytic operations on the encrypted DNA data in an untrusted context, thus allowing healthcare professionals and researchers to run both high volume analytics on many individuals’ sequenced DNA and run complex analytics on a single individual’s sequenced DNA without ever handling plaintext data.
Symmetric encryption is used as a mechanism for controlling which queries are made on the data. The threat model addressed by this system allows an authorized party to run only authorized queries on a genome, while restricting any additional access.
The system implemented achieves substring search, substring search with wildcards representing mutations, and percent match between two nucleotide sequences by converting genomic data into one-hot binary matrixes and encrypting each bit individually using OpenFHE’s LWE Encryption implemented using the CGGI scheme. While runtime for each operation is O(nm), each operation is maximally parallelized using OpenMP, thus allowing for accelerated performance on machines with multiple CPUs without the need for batching
GraphSC: Parallel secure computation made easy
Abstract-We propose introducing modern parallel programming paradigms to secure computation, enabling their secure execution on large datasets. To address this challenge, we present GraphSC, a framework that (i) provides a programming paradigm that allows non-cryptography experts to write secure code; (ii) brings parallelism to such secure implementations; and (iii) meets the needs for obliviousness, thereby not leaking any private information. Using GraphSC, developers can efficiently implement an oblivious version of graph-based algorithms (including sophisticated data mining and machine learning algorithms) that execute in parallel with minimal communication overhead. Importantly, our secure version of graph-based algorithms incurs a small logarithmic overhead in comparison with the non-secure parallel version. We build GraphSC and demonstrate, using several algorithms as examples, that secure computation can be brought into the realm of practicality for big data analysis. Our secure matrix factorization implementation can process 1 million ratings in 13 hours, which is a multiple order-of-magnitude improvement over the only other existing attempt, which requires 3 hours to process 16K ratings
REDsec: Running Encrypted Discretized Neural Networks in Seconds
Machine learning as a service (MLaaS) has risen to become a prominent technology due to the large development time, amount of data, hardware costs, and level of expertise required to develop a machine learning model. However, privacy concerns prevent the adoption of MLaaS for applications with sensitive data. A promising privacy preserving solution is to use fully homomorphic encryption (FHE) to perform the ML computations. Recent advancements have lowered computational costs by several orders of magnitude, opening doors for secure practical applications to be developed. In this work we introduce the REDsec framework that optimizes FHE-based private machine learning inference by leveraging ternary neural networks. Such neural networks, whose weights are constrained to {-1,0,1}, have special properties that we exploit to operate efficiently in the homomorphic domain. REDsec introduces novel features, including a new data re-use scheme that enables bidirectional bridging between the integer and binary domains for the first time in FHE. This enables us to implement very efficient binary operations for multiplication and activations, as well as efficient integer domain additions. Our approach is complemented by a new GPU acceleration library, dubbed (RED)cuFHE, which supports both binary and integer operations on multiple GPUs. REDsec brings unique benefits by supporting user-defined models as input (bring-your-own-network), automation of plaintext training, and efficient evaluation of private inference leveraging TFHE. In our analysis, we perform inference experiments with the MNIST, CIFAR-10, and ImageNet datasets and report performance improvements compared to related works
GPS: Integration of Graphene, PALISADE, and SGX for Large-scale Aggregations of Distributed Data
Secure computing methods such as fully homomorphic encryption and hardware solutions such as Intel Software Guard Extension (SGX) have been applied to provide security for user input in privacy-oriented computation outsourcing. Fully homomorphic encryption is amenable to parallelization and hardware acceleration to improve its scalability and latency, but is limited in the complexity of functions it can efficiently evaluate. SGX is capable of arbitrarily complex calculations, but due to expensive memory paging and context switches, computations in SGX are bound by practical limits. These limitations make either of fully homomorphic encryption or SGX alone unsuitable for large-scale multi-user computations with complex intermediate calculations.
In this paper, we present GPS, a novel framework integrating the Graphene, PALISADE, and SGX technologies. GPS combines the scalability of homomorphic encryption with the arbitrary computational abilities of SGX, forming a more functional and efficient system for outsourced secure computations with large numbers of users. We implement GPS using linear regression training as an instantiation, and our experimental results indicate a base speedup of 1.03x to 8.69x (depending on computation parameters) over an SGX-only linear regression training without multithreading or hardware acceleration. Experiments and projections show improvements over the SGX-only training of 3.28x to 10.43x using multithreading and 4.99x to 12.67 with GPU acceleration
LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers
Prior works have attempted to build private inference frameworks for
transformer-based large language models (LLMs) in a server-client setting,
where the server holds the model parameters and the client inputs the private
data for inference. However, these frameworks impose significant overhead when
the private inputs are forward propagated through the original LLMs. In this
paper, we show that substituting the computation- and communication-heavy
operators in the transformer architecture with privacy-computing friendly
approximations can greatly reduce the private inference costs with minor impact
on model performance. Compared to the state-of-the-art Iron (NeurIPS 2022), our
privacy-computing friendly model inference pipeline achieves a
acceleration in computation and an 80\% reduction in communication overhead,
while retaining nearly identical accuracy
- …