Search CORE

117,325 research outputs found

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

Author: Blanc Guy
Lange Jane
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 17/11/2019
Field of study

Consider the following heuristic for building a decision tree for a function

f : \{0,1\}^n \to \{\pm 1\}

. Place the most influential variable

x_i

f

at the root, and recurse on the subfunctions

f_{x_i=0}

and

f_{x_i=1}

on the left and right subtrees respectively; terminate once the tree is an

\varepsilon

-approximation of

f

. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds:

\circ

Upper bound: For every

f

with decision tree size

s

and every

\varepsilon \in (0,\frac1{2})

, this heuristic builds a decision tree of size at most

s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}

\circ

Lower bound: For every

\varepsilon \in (0,\frac1{2})

and

s \le 2^{\tilde{O}(\sqrt{n})}

, there is an

f

with decision tree size

s

such that this heuristic builds a decision tree of size

s^{\tilde{\Omega}(\log s)}

. We also obtain upper and lower bounds for monotone functions:

s^{O(\sqrt{\log s}/\varepsilon)}

and

s^{\tilde{\Omega}(\sqrt[4]{\log s } )}

respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Generating Compact Tree Ensembles via Annealing

Author: Barbu Adrian
Dawer Gitesh
Guo Yangzi
Publication venue
Publication date: 19/02/2020
Field of study

Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.Comment: Comparison with Random Forest included in the results sectio

arXiv.org e-Print Archive

Crossref

Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture

Author: Wan Andrew
Wright John
Wu Chenggang
Publication venue
Publication date: 10/12/2013
Field of study

Given

f:\{-1, 1\}^n \rightarrow \{-1, 1\}

, define the \emph{spectral distribution} of

f

to be the distribution on subsets of

[n]

in which the set

S

is sampled with probability

\widehat{f}(S)^2

. Then the Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that there is some absolute constant

C

such that

\operatorname{H}[\widehat{f}^2] \leq C\cdot\operatorname{Inf}[f]

. Here,

\operatorname{H}[\widehat{f}^2]

denotes the Shannon entropy of

f

's spectral distribution, and

\operatorname{Inf}[f]

is the total influence of

f

. This conjecture is one of the major open problems in the analysis of Boolean functions, and settling it would have several interesting consequences. Previous results on the FEI conjecture have been largely through direct calculation. In this paper we study a natural interpretation of the conjecture, which states that there exists a communication protocol which, given subset

S

[n]

distributed as

\widehat{f}^2

, can communicate the value of

S

using at most

C\cdot\operatorname{Inf}[f]

bits in expectation. Using this interpretation, we are able show the following results: 1. First, if

f

is computable by a read-

k

decision tree, then

\operatorname{H}[\widehat{f}^2] \leq 9k\cdot \operatorname{Inf}[f]

. 2. Next, if

f

has

\operatorname{Inf}[f] \geq 1

and is computable by a decision tree with expected depth

d

, then

\operatorname{H}[\widehat{f}^2] \leq 12d\cdot \operatorname{Inf}[f]

. 3. Finally, we give a new proof of the main theorem of O'Donnell and Tan (ICALP 2013), i.e. that their FEI

^+

conjecture composes. In addition, we show that natural improvements to our decision tree results would be sufficient to prove the FEI conjecture in its entirety. We believe that our methods give more illuminating proofs than previous results about the FEI conjecture

arXiv.org e-Print Archive

CiteSeerX

Extracting Tree-structures in CT data by Tracking Multiple Statistically Ranked Hypotheses

Author: de Bruijne Marleen
Pedersen Jesper H
Petersen Jens
Selvan Raghavendra
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

In this work, we adapt a method based on multiple hypothesis tracking (MHT) that has been shown to give state-of-the-art vessel segmentation results in interactive settings, for the purpose of extracting trees. Regularly spaced tubular templates are fit to image data forming local hypotheses. These local hypotheses are used to construct the MHT tree, which is then traversed to make segmentation decisions. However, some critical parameters in this method are scale-dependent and have an adverse effect when tracking structures of varying dimensions. We propose to use statistical ranking of local hypotheses in constructing the MHT tree, which yields a probabilistic interpretation of scores across scales and helps alleviate the scale-dependence of MHT parameters. This enables our method to track trees starting from a single seed point. Our method is evaluated on chest CT data to extract airway trees and coronary arteries. In both cases, we show that our method performs significantly better than the original MHT method.Comment: Accepted for publication at the International Journal of Medical Physics and Practic

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

EUR Research Repository

Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

Author: Tabord-Meehan Max
Publication venue
Publication date: 11/06/2020
Field of study

This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, as well as the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, the results we present are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure ,can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment

arXiv.org e-Print Archive