MPCFormer: fast, performant and private Transformer inference with MPC

Guo, Han; Li, Dacheng; Shao, Rulin; Wang, Hongyi; Xing, Eric P.; Zhang, Hao

MPCFormer: fast, performant and private Transformer inference with MPC

Authors: Han Guo
Dacheng Li
Rulin Shao
Hongyi Wang
Eric P. Xing
Hao Zhang
Publication date: 2 November 2022
Publisher

Abstract

Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions for Transformers can increase the inference latency by more than 60x or significantly compromise the quality of inference results. In this paper, we design the framework MPCFORMER using secure multi-party computation (MPC) and Knowledge Distillation (KD). It can be used in tandem with many specifically designed MPC-friendly approximations and trained Transformer models. MPCFORMER significantly speeds up Transformer model inference in MPC settings while achieving similar ML performance to the input model. We evaluate MPCFORMER with various settings in MPC. On the IMDb dataset, we achieve similar performance to BERTBASE, while being 5.3x faster. On the GLUE benchmark, we achieve 97% performance of BERTBASE with a 2.2x speedup. We show that MPCFORMER remains effective with different trained Transformer weights such as ROBERTABASE and larger models including BERTLarge. In particular, we achieve similar performance to BERTLARGE, while being 5.93x faster on the IMDb dataset

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.01452

Last time updated on 08/12/2022