3 research outputs found
PUMA: Secure Inference of LLaMA-7B in Five Minutes
With ChatGPT as a representative, tons of companies have began to provide
services based on large Transformers models. However, using such a service
inevitably leak users' prompts to the model provider. Previous studies have
studied secure inference for Transformer models using secure multiparty
computation (MPC), where model parameters and clients' prompts are kept secret.
Despite this, these frameworks are still limited in terms of model performance,
efficiency, and deployment. To address these limitations, we propose framework
PUMA to enable fast and secure Transformer model inference. Our framework
designs high quality approximations for expensive functions, such as GeLU and
Softmax, which significantly reduce the cost of secure inference while
preserving the model performance. Additionally, we design secure Embedding and
LayerNorm procedures that faithfully implement the desired functionality
without undermining the Transformer architecture. PUMA is about 2x faster than
the state-of-the-art MPC framework MPCFORMER(ICLR 2023) and has similar
accuracy as plaintext models without fine-tuning (which the previous works
failed to achieve).
One more thing, PUMA can evaluate LLaMA-7B in around 5 minutes to generate 1
token. To our best knowledge, this is the first time that a model with such a
parameter size is able to be evaluated under MPC. PUMA has been open-sourced in
the Github repository of SecretFlow-SPU
Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy
When receiving machine learning services from the cloud, the provider does
not need to receive all features; in fact, only a subset of the features are
necessary for the target prediction task. Discerning this subset is the key
problem of this work. We formulate this problem as a gradient-based
perturbation maximization method that discovers this subset in the input
feature space with respect to the functionality of the prediction model used by
the provider. After identifying the subset, our framework, Cloak, suppresses
the rest of the features using utility-preserving constant values that are
discovered through a separate gradient-based optimization process. We show that
Cloak does not necessarily require collaboration from the service provider
beyond its normal service, and can be applied in scenarios where we only have
black-box access to the service provider's model. We theoretically guarantee
that Cloak's optimizations reduce the upper bound of the Mutual Information
(MI) between the data and the sifted representations that are sent out.
Experimental results show that Cloak reduces the mutual information between the
input and the sifted representations by 85.01% with only a negligible reduction
in utility (1.42%). In addition, we show that Cloak greatly diminishes
adversaries' ability to learn and infer non-conducive features.Comment: This paper is presented at the 2021 Web conference (WWW 2021