4,054 research outputs found

    Optimal Sparse Decision Trees

    Full text link
    Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. Our experiments highlight advantages in scalability, speed, and proof of optimality.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

    Building Combined Classifiers

    Get PDF
    This chapter covers different approaches that may be taken when building an ensemble method, through studying specific examples of each approach from research conducted by the authors. A method called Negative Correlation Learning illustrates a decision level combination approach with individual classifiers trained co-operatively. The Model level combination paradigm is illustrated via a tree combination method. Finally, another variant of the decision level paradigm, with individuals trained independently instead of co-operatively, is discussed as applied to churn prediction in the telecommunications industry

    Asymmetric Pruning for Learning Cascade Detectors

    Full text link
    Cascade classifiers are one of the most important contributions to real-time object detection. Nonetheless, there are many challenging problems arising in training cascade detectors. One common issue is that the node classifier is trained with a symmetric classifier. Having a low misclassification error rate does not guarantee an optimal node learning goal in cascade classifiers, i.e., an extremely high detection rate with a moderate false positive rate. In this work, we present a new approach to train an effective node classifier in a cascade detector. The algorithm is based on two key observations: 1) Redundant weak classifiers can be safely discarded; 2) The final detector should satisfy the asymmetric learning objective of the cascade architecture. To achieve this, we separate the classifier training into two steps: finding a pool of discriminative weak classifiers/features and training the final classifier by pruning weak classifiers which contribute little to the asymmetric learning criterion (asymmetric classifier construction). Our model reduction approach helps accelerate the learning time while achieving the pre-determined learning objective. Experimental results on both face and car data sets verify the effectiveness of the proposed algorithm. On the FDDB face data sets, our approach achieves the state-of-the-art performance, which demonstrates the advantage of our approach.Comment: 14 page

    Anytime Point-Based Approximations for Large POMDPs

    Full text link
    The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks

    Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

    Full text link
    Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.Comment: To appear in WWW 201

    Learning to predict under a budget

    Get PDF
    Prediction-time budgets in machine learning applications can arise due to monetary or computational costs associated with acquiring information; they also arise due to latency and power consumption costs in evaluating increasingly more complex models. The goal in such budgeted prediction problems is to learn decision systems that maintain high prediction accuracy while meeting average cost constraints during prediction-time. Such decision systems can potentially adapt to the input examples, predicting most of them at low cost while allocating more budget for the few "hard" examples. In this thesis, I will present several learning methods to better trade-off cost and error during prediction. The conceptual contribution of this thesis is to develop a new paradigm of bottom-up approach instead of the traditional top-down approach. A top-down approach attempts to build out the model by selectively adding the most cost-effective features to improve accuracy. In contrast, a bottom-up approach first learns a highly accurate model and then prunes or adaptively approximates it to trade-off cost and error. Training top-down models in case of feature acquisition costs leads to fundamental combinatorial issues in multi-stage search over all feature subsets. In contrast, we show that the bottom-up methods bypass many of such issues. To develop this theme, we first propose two top-down methods and then two bottom-up methods. The first top-down method uses margin information from training data in the partial feature neighborhood of a test point to either select the next best feature in a greedy fashion or to stop and make prediction. The second top-down method is a variant of random forest (RF) algorithm. We grow decision trees with low acquisition cost and high strength based on greedy mini-max cost-weighted impurity splits. Theoretically, we establish near-optimal acquisition cost guarantees for our algorithm. The first bottom-up method we propose is based on pruning RFs to optimize expected feature cost and accuracy. Given a RF as input, we pose pruning as a novel 0-1 integer program and show that it can be solved exactly via LP relaxation. We further develop a fast primal-dual algorithm that scales to large datasets. The second bottom-up method is adaptive approximation, which significantly generalizes the RF pruning to accommodate more models and other types of costs besides feature acquisition cost. We first train a high-accuracy, high-cost model. We then jointly learn a low-cost gating function together with a low-cost prediction model to adaptively approximate the high-cost model. The gating function identifies the regions of the input space where the low-cost model suffices for making highly accurate predictions. We demonstrate empirical performance of these methods and compare them to the state-of-the-arts. Finally, we study adaptive approximation in the on-line setting to obtain regret guarantees and discuss future work.2019-07-02T00:00:00

    Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

    Get PDF
    Consider the following heuristic for building a decision tree for a function f:{0,1}n{±1}f : \{0,1\}^n \to \{\pm 1\}. Place the most influential variable xix_i of ff at the root, and recurse on the subfunctions fxi=0f_{x_i=0} and fxi=1f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an ε\varepsilon-approximation of ff. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: \circ Upper bound: For every ff with decision tree size ss and every ε(0,12)\varepsilon \in (0,\frac1{2}), this heuristic builds a decision tree of size at most sO(log(s/ε)log(1/ε))s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}. \circ Lower bound: For every ε(0,12)\varepsilon \in (0,\frac1{2}) and s2O~(n)s \le 2^{\tilde{O}(\sqrt{n})}, there is an ff with decision tree size ss such that this heuristic builds a decision tree of size sΩ~(logs)s^{\tilde{\Omega}(\log s)}. We also obtain upper and lower bounds for monotone functions: sO(logs/ε)s^{O(\sqrt{\log s}/\varepsilon)} and sΩ~(logs4)s^{\tilde{\Omega}(\sqrt[4]{\log s } )} respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime
    corecore