117,325 research outputs found

    Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

    Get PDF
    Consider the following heuristic for building a decision tree for a function f:{0,1}nβ†’{Β±1}f : \{0,1\}^n \to \{\pm 1\}. Place the most influential variable xix_i of ff at the root, and recurse on the subfunctions fxi=0f_{x_i=0} and fxi=1f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an Ξ΅\varepsilon-approximation of ff. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: ∘\circ Upper bound: For every ff with decision tree size ss and every Ρ∈(0,12)\varepsilon \in (0,\frac1{2}), this heuristic builds a decision tree of size at most sO(log⁑(s/Ξ΅)log⁑(1/Ξ΅))s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}. ∘\circ Lower bound: For every Ρ∈(0,12)\varepsilon \in (0,\frac1{2}) and s≀2O~(n)s \le 2^{\tilde{O}(\sqrt{n})}, there is an ff with decision tree size ss such that this heuristic builds a decision tree of size sΞ©~(log⁑s)s^{\tilde{\Omega}(\log s)}. We also obtain upper and lower bounds for monotone functions: sO(log⁑s/Ξ΅)s^{O(\sqrt{\log s}/\varepsilon)} and sΞ©~(log⁑s4)s^{\tilde{\Omega}(\sqrt[4]{\log s } )} respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

    Generating Compact Tree Ensembles via Annealing

    Full text link
    Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.Comment: Comparison with Random Forest included in the results sectio

    Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture

    Full text link
    Given f:{βˆ’1,1}nβ†’{βˆ’1,1}f:\{-1, 1\}^n \rightarrow \{-1, 1\}, define the \emph{spectral distribution} of ff to be the distribution on subsets of [n][n] in which the set SS is sampled with probability f^(S)2\widehat{f}(S)^2. Then the Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that there is some absolute constant CC such that H⁑[f^2]≀Cβ‹…Inf⁑[f]\operatorname{H}[\widehat{f}^2] \leq C\cdot\operatorname{Inf}[f]. Here, H⁑[f^2]\operatorname{H}[\widehat{f}^2] denotes the Shannon entropy of ff's spectral distribution, and Inf⁑[f]\operatorname{Inf}[f] is the total influence of ff. This conjecture is one of the major open problems in the analysis of Boolean functions, and settling it would have several interesting consequences. Previous results on the FEI conjecture have been largely through direct calculation. In this paper we study a natural interpretation of the conjecture, which states that there exists a communication protocol which, given subset SS of [n][n] distributed as f^2\widehat{f}^2, can communicate the value of SS using at most Cβ‹…Inf⁑[f]C\cdot\operatorname{Inf}[f] bits in expectation. Using this interpretation, we are able show the following results: 1. First, if ff is computable by a read-kk decision tree, then H⁑[f^2]≀9kβ‹…Inf⁑[f]\operatorname{H}[\widehat{f}^2] \leq 9k\cdot \operatorname{Inf}[f]. 2. Next, if ff has Inf⁑[f]β‰₯1\operatorname{Inf}[f] \geq 1 and is computable by a decision tree with expected depth dd, then H⁑[f^2]≀12dβ‹…Inf⁑[f]\operatorname{H}[\widehat{f}^2] \leq 12d\cdot \operatorname{Inf}[f]. 3. Finally, we give a new proof of the main theorem of O'Donnell and Tan (ICALP 2013), i.e. that their FEI+^+ conjecture composes. In addition, we show that natural improvements to our decision tree results would be sufficient to prove the FEI conjecture in its entirety. We believe that our methods give more illuminating proofs than previous results about the FEI conjecture

    Extracting Tree-structures in CT data by Tracking Multiple Statistically Ranked Hypotheses

    Full text link
    In this work, we adapt a method based on multiple hypothesis tracking (MHT) that has been shown to give state-of-the-art vessel segmentation results in interactive settings, for the purpose of extracting trees. Regularly spaced tubular templates are fit to image data forming local hypotheses. These local hypotheses are used to construct the MHT tree, which is then traversed to make segmentation decisions. However, some critical parameters in this method are scale-dependent and have an adverse effect when tracking structures of varying dimensions. We propose to use statistical ranking of local hypotheses in constructing the MHT tree, which yields a probabilistic interpretation of scores across scales and helps alleviate the scale-dependence of MHT parameters. This enables our method to track trees starting from a single seed point. Our method is evaluated on chest CT data to extract airway trees and coronary arteries. In both cases, we show that our method performs significantly better than the original MHT method.Comment: Accepted for publication at the International Journal of Medical Physics and Practic

    Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

    Full text link
    This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, as well as the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, the results we present are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure ,can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment
    • …
    corecore