Search CORE

305 research outputs found

Towards Fast and Scalable Private Inference

Author: Ebel Austin
Garimella Karthik
Mo Jianqiao
Neda Negar
Reagen Brandon
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/07/2023
Field of study

Privacy and security have rapidly emerged as first order design constraints. Users now demand more protection over who can see their data (confidentiality) as well as how it is used (control). Here, existing cryptographic techniques for security fall short: they secure data when stored or communicated but must decrypt it for computation. Fortunately, a new paradigm of computing exists, which we refer to as privacy-preserving computation (PPC). Emerging PPC technologies can be leveraged for secure outsourced computation or to enable two parties to compute without revealing either users' secret data. Despite their phenomenal potential to revolutionize user protection in the digital age, the realization has been limited due to exorbitant computational, communication, and storage overheads. This paper reviews recent efforts on addressing various PPC overheads using private inference (PI) in neural network as a motivating application. First, the problem and various technologies, including homomorphic encryption (HE), secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are introduced. Next, a characterization of their overheads when used to implement PI is covered. The characterization motivates the need for both GCs and HE accelerators. Then two solutions are presented: HAAC for accelerating GCs and RPU for accelerating HE. To conclude, results and effects are shown with a discussion on what future work is needed to overcome the remaining overheads of PI.Comment: Appear in the 20th ACM International Conference on Computing Frontier

arXiv.org e-Print Archive

REED: Chiplet-Based Scalable Hardware Accelerator for Fully Homomorphic Encryption

Author: Ahmet Can Mert
Aikata Aikata
Maxim Deryabin
Sujoy Sinha Roy
Sunmin Kwon
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 05/08/2023
Field of study

Fully Homomorphic Encryption (FHE) has emerged as a promising technology for processing encrypted data without the need for decryption. Despite its potential, its practical implementation has faced challenges due to substantial computational overhead. To address this issue, we propose the

first

chiplet-based FHE accelerator design `REED\u27, which enables scalability and offers high throughput, thereby enhancing homomorphic encryption deployment in real-world scenarios. It incorporates well-known wafer yield issues during fabrication which significantly impacts production costs. In contrast to state-of-the-art approaches, we also address data exchange overhead by proposing a non-blocking inter-chiplet communication strategy. We incorporate novel pipelined Number Theoretic Transform and automorphism techniques, leveraging parallelism and providing high throughput. Experimental results demonstrate that REED 2.5D integrated circuit consumes 177 mm

^2

chip area, 82.5 W average power in 7nm technology, and achieves an impressive speedup of up to 5,982

\times

compared to a CPU (24-core 2

\times

Intel X5690), and 2

\times

better energy efficiency and 50\% lower development cost than state-of-the-art ASIC accelerator. To evaluate its practical impact, we are the

first

to benchmark an encrypted deep neural network training. Overall, this work successfully enhances the practicality and deployability of fully homomorphic encryption in real-world scenarios

Cryptology ePrint Archive

SHIELD: Scalable Homomorphic Implementation of Encrypted Data-Classifiers

Author: Gulak Glenn
Khedr Alhassan
Vaikuntanathan Vinod
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2015
Field of study

Homomorphic encryption (HE) systems enable computations on encrypted data, without decrypting and without knowledge of the secret key. In this work, we describe an optimized Ring Learning With Errors (RLWE) based implementation of a variant of the HE system recently proposed by Gentry, Sahai and Waters (GSW). Although this system was widely believed to be less efficient than its contemporaries, we demonstrate quite the opposite behavior for a large class of applications. We first highlight and carefully exploit the algebraic features of the system to achieve significant speedup over the state-of-the-art HE implementation, namely the IBM homomorphic encryption library (HElib). We introduce several optimizations on top of our HE implementation, and use the resulting scheme to construct a homomorphic Bayesian spam filter, secure multiple keyword search, and a homomorphic evaluator for binary decision trees. Our results show a factor of 10× improvement in performance (under the same security settings and CPU platforms) compared to IBM HElib for these applications. Our system is built to be easily portable to GPUs (unlike IBM HElib) which results in an additional speedup of up to a factor of 103.5× to offer an overall speedup of 1,035×

CiteSeerX

DSpace@MIT

Crossref

Cryptology ePrint Archive

Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption

Author: Ahn Jung Ho
Cheon Jung Hee
Jung Wonkyung
Kim Donghwan
Kim Jongmin
Kim Sangpyo
Park Jaiyoung
Publication venue
Publication date: 25/10/2023
Field of study

Incorporating fully homomorphic encryption (FHE) into the inference process of a convolutional neural network (CNN) draws enormous attention as a viable approach for achieving private inference (PI). FHE allows delegating the entire computation process to the server while ensuring the confidentiality of sensitive client-side data. However, practical FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead. To address these challenges, we propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme. We evaluate our method using the ResNet models on the CIFAR-10 and ImageNet datasets, achieving several orders of magnitude improvement compared to prior work and reducing the latency of the encrypted CNN inference to 1.4 seconds on an NVIDIA A100 GPU. We also show that the latency drops to a mere 0.03 seconds with a custom hardware design.Comment: 3 pages, 1 figure, appears at DISCC 2023 (2nd Workshop on Data Integrity and Secure Cloud Computing, in conjunction with the 56th International Symposium on Microarchitecture (MICRO 2023)

arXiv.org e-Print Archive

A Programmable SoC-Based Accelerator for Privacy-Enhancing Technologies and Functional Encryption

Author: Bahadori Milad
Jarvinen Kimmo
Publication venue
Publication date: 10/09/2020
Field of study

A multitude of privacy-enhancing technologies (PETs) has been presented recently to solve the privacy problems of contemporary services utilizing cloud computing. Many of them are based on additively homomorphic encryption (AHE) that allows the computation of additions on encrypted data. The main technical obstacles for adaptation of PETs in practical systems are related to performance overheads compared with current privacy-violating alternatives. In this article, we present a hardware/software (HW/SW) codesign for programmable systems-on-chip (SoCs) that is designed for accelerating applications based on the Paillier encryption. Our implementation is a microcode-based multicore architecture that is suitable for accelerating various PETs using AHE with large integer modular arithmetic. We instantiate the implementation in a Xilinx Zynq-7000 programmable SoC and provide performance evaluations in real hardware. We also investigate its efficiency in a high-end Xilinx UltraScale+ programmable SoC. We evaluate the implementation with two target use cases that have relevance in PETs: privacy-preserving computation of squared Euclidean distances over encrypted data and multi-input functional encryption (FE) for inner products. Both of them represent the first hardware acceleration results for such operations, and in particular, the latter one is among the very first published implementation results of FE on any platform.Peer reviewe

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

Author: Ahn Jung Ho
Choi Jaeyoung
Kim Jongmin
Kim Sangpyo
Publication venue
Publication date: 12/08/2023
Field of study

Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive amounts of chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible architecture of a chiplet core whose configuration can be adjusted to conform to the global organization of chiplets and design constraints. The distinctive feature of our core is a recomposable functional unit providing varying computational throughput for number-theoretic transform (NTT), the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the network overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the technology constraints of MCMs. Also, we analyze the effectiveness of various algorithms, including a novel limb duplication algorithm, on the MCM architecture. A detailed evaluation shows that a CiFHER package composed of 4 to 64 compact chiplets provides performance comparable to state-of-the-art monolithic ASIC FHE accelerators with significantly lower package-wide power consumption while reducing the area of a single core to as small as 4.28mm

^2

.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Edge Computing for AI and ML: Enhancing Performance and Privacy in Data Analysis

Author: Lohith Paripati
Publication venue: Auricle Global Society of Education and Research
Publication date: 08/07/2024
Field of study

Centralised cloud computing paradigms are encountering difficulties with latency, bandwidth, privacy, and security due to the exponential growth of data volumes produced by sensors and Internet of Things (IoT) devices. One potential approach to these constraints is edge computing, which moves computers and storage closer to the data sources. With this paradigm change, data privacy is improved, network congestion is decreased, and real-time processing is made possible. Aiming to improve the efficiency and confidentiality of data analysis applications powered by artificial intelligence (AI) and machine learning (ML), this article investigated the possibility of edge computing. We provide a thorough analysis of the latest developments in edge computing frameworks, algorithms, and architectures that allow for safe and fast training and inference of AI/ML models at the edge. We also go over the main obstacles and where the field may go from here in terms of research. Our research lays the groundwork for future intelligent edge systems by demonstrating the substantial advantages of edge computing in facilitating low-latency, energy-efficient, and privacy-preserving AI/ML applications. &nbsp

International Journal on Recent and Innovation Trends in Computing and Communication

Harnessing the Power of Distributed Computing: Advancements in Scientific Applications, Homomorphic Encryption, and Federated Learning Security

Author: Tawose Olamide Timothy
Publication venue
Publication date: 19/09/2023
Field of study

Data explosion poses lot of challenges to the state-of-the art systems, applications, and methodologies. It has been reported that 181 zettabytes of data are expected to be generated in 2025 which is over 150\% increase compared to the data that is expected to be generated in 2023. However, while system manufacturers are consistently developing devices with larger storage spaces and providing alternative storage capacities in the cloud at affordable rates, another key challenge experienced is how to effectively process the fraction of large scale of stored data in time-critical conventional systems. One transformative paradigm revolutionizing the processing and management of these large data is distributed computing whose application requires deep understanding. This dissertation focuses on exploring the potential impact of applying efficient distributed computing concepts to long existing challenges or issues in (i) a widely data-intensive scientific application (ii) applying homomorphic encryption to data intensive workloads found in outsourced databases and (iii) security of tokenized incentive mechanism for Federated learning (FL) systems.The first part of the dissertation tackles the Microelectrode arrays (MEAs) parameterization problem from an orthogonal viewpoint enlightened by algebraic topology, which allows us to algebraically parametrize MEAs whose structure and intrinsic parallelism are hard to identify otherwise. We implement a new paradigm, namely Parma, to demonstrate the effectiveness of the proposed approach and report how it outperforms the state-of-the-practice in time, scalability, and memory usage.The second part discusses our work on introducing the concept of parallel caching of secure aggregation to mitigate the performance overhead incurred by the HE module in outsourced databases. The key idea of this optimization approach is caching selected radix-ciphertexts in parallel without violating existing security guarantees of the primitive/base HE scheme. A new radix HE algorithm was designed and applied to both batch and incremental HE schemes, and experiments carried out on six workloads show that the proposed caching boost state-of-the-art HE schemes by high orders of magnitudes.In the third part, I will discuss our work on leveraging the security benefit of blockchains to enhance or protect the fairness and reliability of tokenized incentive mechanism for FL systems. We designed a blockchain-based auditing protocol to mitigate Gaussian attacks and carried out experiments with multiple FL aggregation algorithms, popular data sets and a variety of scales to validate its effectiveness

University of Nevada, Reno ScholarWorks Repository

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Author: Brooks David
Gupta Udit
Johnson Jeff
Lai Liangzhen
Lam Maximilian
Lee Hsien-Hsin S.
Leontiadis Ilias
Li Yang
Maeng Kiwan
Reddi Vijay Janapa
Rhu Minsoo
Suh G. Edward
Wei Gu-Yeon
Xiong Wenjie
Publication venue
Publication date: 25/09/2023
Field of study

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than

20 \times

over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over

5 \times

additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to

100,000

queries per second -- a

>100 \times

throughput improvement over a CPU-based baseline -- while maintaining model accuracy

arXiv.org e-Print Archive