50 research outputs found
Model Extraction Warning in MLaaS Paradigm
Cloud vendors are increasingly offering machine learning services as part of
their platform and services portfolios. These services enable the deployment of
machine learning models on the cloud that are offered on a pay-per-query basis
to application developers and end users. However recent work has shown that the
hosted models are susceptible to extraction attacks. Adversaries may launch
queries to steal the model and compromise future query payments or privacy of
the training data. In this work, we present a cloud-based extraction monitor
that can quantify the extraction status of models by observing the query and
response streams of both individual and colluding adversarial users. We present
a novel technique that uses information gain to measure the model learning rate
by users with increasing number of queries. Additionally, we present an
alternate technique that maintains intelligent query summaries to measure the
learning rate relative to the coverage of the input feature space in the
presence of collusion. Both these approaches have low computational overhead
and can easily be offered as services to model owners to warn them of possible
extraction attacks from adversaries. We present performance results for these
approaches for decision tree models deployed on BigML MLaaS platform, using
open source datasets and different adversarial attack strategies
Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems
Decision-based adversarial attacks construct inputs that fool a
machine-learning model into making targeted mispredictions by making only
hard-label queries. For the most part, these attacks have been applied directly
to isolated neural network models. However, in practice, machine learning
models are just a component of a much larger system. By adding just a single
preprocessor in front of a classifier, we find that state-of-the-art
query-based attacks are as much as seven times less effective at attacking a
prediction pipeline than attacking the machine learning model alone. Hence,
attacks that are unaware of this invariance inevitably waste a large number of
queries to re-discover or overcome it. We, therefore, develop techniques to
first reverse-engineer the preprocessor and then use this extracted information
to attack the end-to-end system. Our extraction method requires only a few
hundred queries to learn the preprocessors used by most publicly available
model pipelines, and our preprocessor-aware attacks recover the same efficacy
as just attacking the model alone. The code can be found at
https://github.com/google-research/preprocessor-aware-black-box-attack.Comment: Code can be found at
https://github.com/google-research/preprocessor-aware-black-box-attac
Evaluating Superhuman Models with Consistency Checks
If machine learning models were to achieve superhuman abilities at various
reasoning or decision-making tasks, how would we go about evaluating such
models, given that humans would necessarily be poor proxies for ground truth?
In this paper, we propose a framework for evaluating superhuman models via
consistency checks. Our premise is that while the correctness of superhuman
decisions may be impossible to evaluate, we can still surface mistakes if the
model's decisions fail to satisfy certain logical, human-interpretable rules.
We instantiate our framework on three tasks where correctness of decisions is
hard to evaluate due to either superhuman model abilities, or to otherwise
missing ground truth: evaluating chess positions, forecasting future events,
and making legal judgments. We show that regardless of a model's (possibly
superhuman) performance on these tasks, we can discover logical inconsistencies
in decision making. For example: a chess engine assigning opposing valuations
to semantically identical boards; GPT-4 forecasting that sports records will
evolve non-monotonically over time; or an AI judge assigning bail to a
defendant only after we add a felony to their criminal record.Comment: 31 pages, 15 figures. Under review. Code and data are available at
https://github.com/ethz-spylab/superhuman-ai-consistenc
Considerations for Differentially Private Learning with Large-Scale Public Pretraining
The performance of differentially private machine learning can be boosted
significantly by leveraging the transfer learning capabilities of non-private
models pretrained on large public datasets. We critically review this approach.
We primarily question whether the use of large Web-scraped datasets should be
viewed as differential-privacy-preserving. We caution that publicizing these
models pretrained on Web data as "private" could lead to harm and erode the
public's trust in differential privacy as a meaningful definition of privacy.
Beyond the privacy considerations of using public data, we further question
the utility of this paradigm. We scrutinize whether existing machine learning
benchmarks are appropriate for measuring the ability of pretrained models to
generalize to sensitive domains, which may be poorly represented in public Web
data. Finally, we notice that pretraining has been especially impactful for the
largest available models -- models sufficiently large to prohibit end users
running them on their own devices. Thus, deploying such models today could be a
net loss for privacy, as it would require (private) data to be outsourced to a
more compute-powerful third party.
We conclude by discussing potential paths forward for the field of private
learning, as public pretraining becomes more popular and powerful