1,133 research outputs found
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers
We prove a strong composition theorem for junta complexity and show how such
theorems can be used to generically boost the performance of property testers.
The -approximate junta complexity of a function is the
smallest integer such that is -close to a function that
depends only on variables. A strong composition theorem states that if
has large -approximate junta complexity, then has even
larger -approximate junta complexity, even for . We develop a fairly complete understanding of this behavior,
proving that the junta complexity of is characterized by that of
along with the multivariate noise sensitivity of . For the important
case of symmetric functions , we relate their multivariate noise sensitivity
to the simpler and well-studied case of univariate noise sensitivity.
We then show how strong composition theorems yield boosting algorithms for
property testers: with a strong composition theorem for any class of functions,
a large-distance tester for that class is immediately upgraded into one for
small distances. Combining our contributions yields a booster for junta
testers, and with it new implications for junta testing. This is the first
boosting-type result in property testing, and we hope that the connection to
composition theorems adds compelling motivation to the study of both topics.Comment: 44 pages, 1 figure, FOCS 202
Decision Tree Heuristics Can Fail, Even in the Smoothed Setting
Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996).
Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees.
We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{?(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{?(k)} before achieving high accuracy
A Query-Optimal Algorithm for Finding Counterfactuals
We design an algorithm for finding counterfactuals with strong theoretical
guarantees on its performance. For any monotone model and
instance , our algorithm makes queries to and returns {an {\sl optimal}} counterfactual for
: a nearest instance to for which . Here is the sensitivity of , a discrete analogue of the
Lipschitz constant, and is the distance from to
its nearest counterfactuals. The previous best known query complexity was
, achievable by brute-force local search. We
further prove a lower bound of on the query complexity of any algorithm, thereby showing that the
guarantees of our algorithm are essentially optimal.Comment: 22 pages, ICML 202
- …