52,974 research outputs found
Learning using Local Membership Queries
We introduce a new model of membership query (MQ) learning, where the
learning algorithm is restricted to query points that are \emph{close} to
random examples drawn from the underlying distribution. The learning model is
intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where
the queries are allowed to be arbitrary points).
Membership query algorithms are not popular among machine learning
practitioners. Apart from the obvious difficulty of adaptively querying
labelers, it has also been observed that querying \emph{unnatural} points leads
to increased noise from human labelers (Lang and Baum, 1992). This motivates
our study of learning algorithms that make queries that are close to examples
generated from the data distribution.
We restrict our attention to functions defined on the -dimensional Boolean
hypercube and say that a membership query is local if its Hamming distance from
some example in the (random) training data is at most . We show the
following results in this model:
(i) The class of sparse polynomials (with coefficients in R) over
is polynomial time learnable under a large class of \emph{locally smooth}
distributions using -local queries. This class also includes the
class of -depth decision trees.
(ii) The class of polynomial-sized decision trees is polynomial time
learnable under product distributions using -local queries.
(iii) The class of polynomial size DNF formulas is learnable under the
uniform distribution using -local queries in time
.
(iv) In addition we prove a number of results relating the proposed model to
the traditional PAC model and the PAC+MQ model
Active classification with comparison queries
We study an extension of active learning in which the learning algorithm may
ask the annotator to compare the distances of two examples from the boundary of
their label-class. For example, in a recommendation system application (say for
restaurants), the annotator may be asked whether she liked or disliked a
specific restaurant (a label query); or which one of two restaurants did she
like more (a comparison query).
We focus on the class of half spaces, and show that under natural
assumptions, such as large margin or bounded bit-description of the input
examples, it is possible to reveal all the labels of a sample of size using
approximately queries. This implies an exponential improvement over
classical active learning, where only label queries are allowed. We complement
these results by showing that if any of these assumptions is removed then, in
the worst case, queries are required.
Our results follow from a new general framework of active learning with
additional queries. We identify a combinatorial dimension, called the
\emph{inference dimension}, that captures the query complexity when each
additional query is determined by examples (such as comparison queries,
each of which is determined by the two compared examples). Our results for half
spaces follow by bounding the inference dimension in the cases discussed above.Comment: 23 pages (not including references), 1 figure. The new version
contains a minor fix in the proof of Lemma 4.
MLCapsule: Guarded Offline Deployment of Machine Learning as a Service
With the widespread use of machine learning (ML) techniques, ML as a service
has become increasingly popular. In this setting, an ML model resides on a
server and users can query it with their data via an API. However, if the
user's input is sensitive, sending it to the server is undesirable and
sometimes even legally not possible. Equally, the service provider does not
want to share the model by sending it to the client for protecting its
intellectual property and pay-per-query business model.
In this paper, we propose MLCapsule, a guarded offline deployment of machine
learning as a service. MLCapsule executes the model locally on the user's side
and therefore the data never leaves the client. Meanwhile, MLCapsule offers the
service provider the same level of control and security of its model as the
commonly used server-side execution. In addition, MLCapsule is applicable to
offline applications that require local execution. Beyond protecting against
direct model access, we couple the secure offline deployment with defenses
against advanced attacks on machine learning models such as model stealing,
reverse engineering, and membership inference
- …