31,183 research outputs found
Learning using Local Membership Queries
We introduce a new model of membership query (MQ) learning, where the
learning algorithm is restricted to query points that are \emph{close} to
random examples drawn from the underlying distribution. The learning model is
intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where
the queries are allowed to be arbitrary points).
Membership query algorithms are not popular among machine learning
practitioners. Apart from the obvious difficulty of adaptively querying
labelers, it has also been observed that querying \emph{unnatural} points leads
to increased noise from human labelers (Lang and Baum, 1992). This motivates
our study of learning algorithms that make queries that are close to examples
generated from the data distribution.
We restrict our attention to functions defined on the -dimensional Boolean
hypercube and say that a membership query is local if its Hamming distance from
some example in the (random) training data is at most . We show the
following results in this model:
(i) The class of sparse polynomials (with coefficients in R) over
is polynomial time learnable under a large class of \emph{locally smooth}
distributions using -local queries. This class also includes the
class of -depth decision trees.
(ii) The class of polynomial-sized decision trees is polynomial time
learnable under product distributions using -local queries.
(iii) The class of polynomial size DNF formulas is learnable under the
uniform distribution using -local queries in time
.
(iv) In addition we prove a number of results relating the proposed model to
the traditional PAC model and the PAC+MQ model
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Learning pseudo-Boolean k-DNF and Submodular Functions
We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be
represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a
natural generalization of DNF representation for functions with integer range.
Each term in such a formula has an associated integral constant. We show that
an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all
constants associated with the terms of the formula are bounded.
This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to
pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership
queries under the uniform distribution for submodular functions of the form
f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k
\log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the
agnostic setting. The line of previous work on learning submodular functions
[Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi,
Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for
learning submodular functions in this setting, for fixed epsilon and delta.
Our learning algorithm implies a property tester for submodularity of
functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for
k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon
- …