117,325 research outputs found
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Generating Compact Tree Ensembles via Annealing
Tree ensembles are flexible predictive models that can capture relevant
variables and to some extent their interactions in a compact and interpretable
manner. Most algorithms for obtaining tree ensembles are based on versions of
boosting or Random Forest. Previous work showed that boosting algorithms
exhibit a cyclic behavior of selecting the same tree again and again due to the
way the loss is optimized. At the same time, Random Forest is not based on loss
optimization and obtains a more complex and less interpretable model. In this
paper we present a novel method for obtaining compact tree ensembles by growing
a large pool of trees in parallel with many independent boosting threads and
then selecting a small subset and updating their leaf weights by loss
optimization. We allow for the trees in the initial pool to have different
depths which further helps with generalization. Experiments on real datasets
show that the obtained model has usually a smaller loss than boosting, which is
also reflected in a lower misclassification error on the test set.Comment: Comparison with Random Forest included in the results sectio
Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture
Given , define the \emph{spectral
distribution} of to be the distribution on subsets of in which the
set is sampled with probability . Then the Fourier
Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that
there is some absolute constant such that . Here,
denotes the Shannon entropy of 's spectral distribution, and
is the total influence of . This conjecture is one
of the major open problems in the analysis of Boolean functions, and settling
it would have several interesting consequences.
Previous results on the FEI conjecture have been largely through direct
calculation. In this paper we study a natural interpretation of the conjecture,
which states that there exists a communication protocol which, given subset
of distributed as , can communicate the value of using
at most bits in expectation.
Using this interpretation, we are able show the following results:
1. First, if is computable by a read- decision tree, then
.
2. Next, if has and is computable by a
decision tree with expected depth , then .
3. Finally, we give a new proof of the main theorem of O'Donnell and Tan
(ICALP 2013), i.e. that their FEI conjecture composes.
In addition, we show that natural improvements to our decision tree results
would be sufficient to prove the FEI conjecture in its entirety. We believe
that our methods give more illuminating proofs than previous results about the
FEI conjecture
Extracting Tree-structures in CT data by Tracking Multiple Statistically Ranked Hypotheses
In this work, we adapt a method based on multiple hypothesis tracking (MHT)
that has been shown to give state-of-the-art vessel segmentation results in
interactive settings, for the purpose of extracting trees. Regularly spaced
tubular templates are fit to image data forming local hypotheses. These local
hypotheses are used to construct the MHT tree, which is then traversed to make
segmentation decisions. However, some critical parameters in this method are
scale-dependent and have an adverse effect when tracking structures of varying
dimensions. We propose to use statistical ranking of local hypotheses in
constructing the MHT tree, which yields a probabilistic interpretation of
scores across scales and helps alleviate the scale-dependence of MHT
parameters. This enables our method to track trees starting from a single seed
point. Our method is evaluated on chest CT data to extract airway trees and
coronary arteries. In both cases, we show that our method performs
significantly better than the original MHT method.Comment: Accepted for publication at the International Journal of Medical
Physics and Practic
Stratification Trees for Adaptive Randomization in Randomized Controlled Trials
This paper proposes an adaptive randomization procedure for two-stage
randomized controlled trials. The method uses data from a first-wave experiment
in order to determine how to stratify in a second wave of the experiment, where
the objective is to minimize the variance of an estimator for the average
treatment effect (ATE). We consider selection from a class of stratified
randomization procedures which we call stratification trees: these are
procedures whose strata can be represented as decision trees, with differing
treatment assignment probabilities across strata. By using the first wave to
estimate a stratification tree, we simultaneously select which covariates to
use for stratification, how to stratify over these covariates, as well as the
assignment probabilities within these strata. Our main result shows that using
this randomization procedure with an appropriate estimator results in an
asymptotic variance which is minimal in the class of stratification trees.
Moreover, the results we present are able to accommodate a large class of
assignment mechanisms within strata, including stratified block randomization.
In a simulation study, we find that our method, paired with an appropriate
cross-validation procedure ,can improve on ad-hoc choices of stratification. We
conclude by applying our method to the study in Karlan and Wood (2017), where
we estimate stratification trees using the first wave of their experiment
- β¦