International Association for Cryptologic Research (IACR)
Abstract
Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server.
In this paper, we propose NEXUS the first non-interactive protocol for secure transformer inference, where the client is only required to submit an encrypted input and await the encrypted result from the server. Central to NEXUS are two innovative techniques: SIMD ciphertext compression/decompression, and SIMD slots folding. Consequently, our approach achieves a speedup of 2.8Ã and a remarkable bandwidth reduction of 368.6Ã, compared to the state-of-the-art solution presented in S&P \u2724