Search CORE

10 research outputs found

CrypTFlow2: Practical 2-Party Secure Inference

Author: Agrawal Nitin
Ball Marshall
Beaver Donald
Blakley G. R.
Boemer Fabian
Boemer Fabian
Brakerski Zvika
Brassard Gilles
Chandran Nishanth
Chi-Chih Yao Andrew
Couteau Geoffroy
Dathathri Roshan
Demmler Daniel
Dessouky Ghada
Escudero Daniel
Garay Juan A.
Gilad-Bachrach Ran
Goldreich Oded
Gueron Shay
Guo C.
Hazay Carmit
He Kaiming
Huang Gao
Hubara Itay
Ishai Yuval
Jacob Benoit
Juvekar Chiraag
Kolesnikov Vladimir
Kumar Nishant
Liu Jian
Mishra Pratyush
Mohassel Payman
Nagel Markus
Niklas Bü
Riazi M. Sadegh
Riazi M. Sadegh
Rouhani Bita Darvish
Wagh Sameer
Zheng Wenting
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 18/08/2020
Field of study

We present CrypTFlow2, a cryptographic framework for secure inference over realistic Deep Neural Networks (DNNs) using secure 2-party computation. CrypTFlow2 protocols are both correct -- i.e., their outputs are bitwise equivalent to the cleartext execution -- and efficient -- they outperform the state-of-the-art protocols in both latency and scale. At the core of CrypTFlow2, we have new 2PC protocols for secure comparison and division, designed carefully to balance round and communication complexity for secure inference tasks. Using CrypTFlow2, we present the first secure inference over ImageNet-scale DNNs like ResNet50 and DenseNet121. These DNNs are at least an order of magnitude larger than those considered in the prior work of 2-party DNN inference. Even on the benchmarks considered by prior work, CrypTFlow2 requires an order of magnitude less communication and 20x-30x less time than the state-of-the-art

Crossref

Cryptology ePrint Archive

Recommended from our members

Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures

Author: Dathathri Roshan
Publication venue
Publication date: 09/10/2020
Field of study

Distributed and heterogeneous architectures are tedious to program because devices such as CPUs, GPUs, and FPGAs provide different programming abstractions and may have disjoint memories, even if they are on the same machine. In this thesis, I present compiler and runtime systems that make it easier to develop efficient programs for privacy-preserving computation and graph processing applications on such architectures. Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations on encrypted data without requiring a secret key. Recent cryptographic advances have pushed FHE into the realm of practical applications. However, programming these applications remains a huge challenge, as it requires cryptographic domain expertise to ensure correctness, security, and performance. This thesis introduces a domain-specific compiler for fully-homomorphic deep neural network (DNN) inferencing as well as a general-purpose language and compiler for fully-homomorphic computation: 1. I present CHET, a domain-specific optimizing compiler, that is designed to make the task of programming DNN inference applications using FHE easier. CHET automates many laborious and error prone programming tasks including encryption parameter selection to guarantee security and accuracy of the computation, determining efficient data layouts, and performing scheme-specific optimizations. Our evaluation of CHET on a collection of popular DNNs shows that CHET-generated programs outperform expert-tuned ones by an order of magnitude. 2. I present a new FHE language called Encrypted Vector Arithmetic (EVA), which includes an optimizing compiler that generates correct and secure FHE programs, while hiding all the complexities of the target FHE scheme. Bolstered by our optimizing compiler, programmers can develop efficient general-purpose FHE applications directly in EVA. EVA is designed to also work as an intermediate representation that can be a target for compiling higher-level domain-specific languages. To demonstrate this, we have re-targeted CHET onto EVA. Due to the novel optimizations in EVA, its programs are on average ~ 5.3x faster than those generated by the unmodified version of CHET. These languages and compilers enable a wider adoption of FHE. Applications in several areas like machine learning, bioinformatics, and security need to process and analyze very large graphs. Distributed clusters are essential in processing such graphs in reasonable time. I present a novel approach to building distributed graph analytics systems that exploits heterogeneity in processor types, partitioning policies, and programming models. The key to this approach is Gluon, a domain-specific communication-optimizing substrate. Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters in the bulk-synchronous parallel (BSP) model and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies. We also extend Gluon to support lock-free, non-blocking, bulk-asynchronous execution by introducing the bulk-asynchronous parallel (BASP) model. Our experiments were done on CPU clusters with up to 256 multi-core, multi-socket hosts and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ~ 2.6x on the average. Gluon's BASP-style execution is on average ~ 1.5x faster than its BSP-style execution for graph applications on real-world large-diameter graphs at scale. The D-Galois and D-IrGL systems built using Gluon scale well and are faster than Gemini, the state-of-the-art distributed CPU-only graph analytics system, by factors of ~ 3.9x and ~ 4.9x on average using distributed CPUs and distributed GPUs respectively. The Gluon-based D-IrGL system for distributed GPUs is also on average ~ 12x faster than Lux, the only other distributed GPU-only graph analytics system. The Gluon-based D-IrGL system was one of the first distributed GPU graph analytics systems and is the only asynchronous one.Computer Science

Texas ScholarWorks

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Author: Bondhugula Uday
Dathathri Roshan
Ramashekar Thejas
Reddy Chandan
Publication venue: IEEE
Publication date
Field of study

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes

Open Access Repository of IISc Research Publications

Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

Author: Chen Xuhao
Dathathri Roshan
Gill Gurbinder
Hoang Loc
Pingali Keshav
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/10/2022
Field of study

DSpace@MIT

Optimizing ordered graph algorithms with GraphIt

Author: Abeydeera Maleen
Alistarh Dan
Alistarh Dan
Baghdadi Riyadh
Blelloch Guy E.
Dathathri Roshan
Dhulipala Laxman
Fidel Adam
Gonzalez Joseph E.
Grossman Samuel
Ham Tae Jun
Hassaan Muhammad Amber
Hassaan Muhammad Amber
Jeffrey M. C.
Jeffrey M. C.
Jeffrey M. C.
Kyrola Aapo
Meng Ke
Mukkara A.
Pai Sreepathi
Prabhakaran Vijayan
Ragan-Kelley Jonathan
Shun Julian
Subramanian S.
Zhang Mingxing
Zhang Yunming
Zhu Xiaowei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/11/2020
Field of study

© 2020 Copyright held by the owner/author(s). Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a domain-specific language for writing graph applications, to simplify writing high-performance parallel ordered graph algorithms. The extension enables vertices to be processed in a dynamic order while hiding low-level implementation details from the user. We extend the compiler with new program analyses, transformations, and code generation to produce fast implementations of ordered parallel graph algorithms. We also introduce bucket fusion, a new performance optimization that fuses together different rounds of ordered algorithms to reduce synchronization overhead, resulting in 1.2×-3× speedup over the fastest existing ordered algorithm implementations on road networks with large diameters. With the extension, GraphIt achieves up to 3× speedup on six ordered graph algorithms over state-of-the-art frameworks and hand-optimized implementations (Julienne, Galois, and GAPBS) that support ordered algorithms

DSpace@MIT

Crossref

CuSP

Author: Boldi Paolo
Boldi Paolo
Boldi Paolo
Boman E. G.
Buono D.
Chen Rong
Dang Hoang-Vu
Dathathri Roshan
Gill Gurbinder
Giraph Apache
Gonzalez Joseph E.
Huang Jiewen
Johan
Karypis George
Kumar Vipin
Leskovec Jure
Malewicz Grzegorz
Martella C.
Mayer Christian
Meusel Robert
Nguyen Donald
Petroni Fabio
Project The Lemur
Que X.
Slota G. M.
Slota G. M.
Stanton Isabelle
Stanzione Dan
Tsourakakis Charalampos
Wang L.
Xie Cong
Zhu Xiaowei
Çatalyürek Ümit V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref