885 research outputs found

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

    Differential Evolution Algorithm in the Construction of Interpretable Classification Models

    Get PDF
    In this chapter, the application of a differential evolution-based approach to induce oblique decision trees (DTs) is described. This type of decision trees uses a linear combination of attributes to build oblique hyperplanes dividing the instance space. Oblique decision trees are more compact and accurate than the traditional univariate decision trees. On the other hand, as differential evolution (DE) is an efficient evolutionary algorithm (EA) designed to solve optimization problems with real-valued parameters, and since finding an optimal hyperplane is a hard computing task, this metaheuristic (MH) is chosen to conduct an intelligent search of a near-optimal solution. Two methods are described in this chapter: one implementing a recursive partitioning strategy to find the most suitable oblique hyperplane of each internal node of a decision tree, and the other conducting a global search of a near-optimal oblique decision tree. A statistical analysis of the experimental results suggests that these methods show better performance as decision tree induction procedures in comparison with other supervised learning approaches

    A survey of cost-sensitive decision tree induction algorithms

    Get PDF
    The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

    Fisher’s decision tree

    Get PDF
    Univariate decision trees are classifiers currently used in many data mining applications. This classifier discovers partitions in the input space via hyperplanes that are orthogonal to the axes of attributes, producing a model that can be understood by human experts. One disadvantage of univariate decision trees is that they produce complex and inaccurate models when decision boundaries are not orthogonal to axes. In this paper we introduce the Fisher’s Tree, it is a classifier that takes advantage of dimensionality reduction of Fisher’s linear discriminant and uses the decomposition strategy of decision trees, to come up with an oblique decision tree. Our proposal generates an artificial attribute that is used to split the data in a recursive way. The Fisher’s decision tree induces oblique trees whose accuracy, size, number of leaves and training time are competitive with respect to other decision trees reported in the literature. We use more than ten public available data sets to demonstrate the effectiveness of our method

    CSNL: A cost-sensitive non-linear decision tree algorithm

    Get PDF
    This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification. The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes. The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes

    Oblique Decision Tree Algorithm with Minority Condensation for Class Imbalanced Problem

    Get PDF
    In recent years, a significant issue in classification is to handle a dataset containing imbalanced number of instances in each class. Classifier modification is one of the well-known techniques to deal with this particular issue. In this paper, the effective classification model based on an oblique decision tree is enhanced to work with the imbalanced datasets that is called oblique minority condensed decision tree (OMCT). Initially, it selects the best axis-parallel hyperplane based on decision tree algorithm using the minority entropy of instances within the minority inner fence selection. Then it perturbs this hyperplane along each axis to improve its minority entropy. Finally, it stochastically perturbs this hyperplane to escape the local solution. From the experimental results, OMCT significantly outperforms 6 state-of-the-art decision tree algorithms that are CART, C4.5, OC1, AE, DCSM and ME on 18 real-world datasets from UCI in term of precision, recall and F1 score. Moreover, the size of decision tree from OMCT is significantly smaller than others

    Optimization algorithms for decision tree induction

    Get PDF
    Aufgrund der guten Interpretierbarkeit gehören Entscheidungsbäume zu den am häufigsten verwendeten Modellen des maschinellen Lernens zur Lösung von Klassifizierungs- und Regressionsaufgaben. Ihre Vorhersagen sind oft jedoch nicht so genau wie die anderer Modelle. Der am weitesten verbreitete Ansatz zum Lernen von Entscheidungsbäumen ist die Top-Down-Methode, bei der rekursiv neue Aufteilungen anhand eines einzelnen Merkmals eingefuhrt werden, die ein bestimmtes Aufteilungskriterium minimieren. Eine Möglichkeit diese Strategie zu verbessern und kleinere und genauere Entscheidungsbäume zu erzeugen, besteht darin, andere Arten von Aufteilungen zuzulassen, z.B. welche, die mehrere Merkmale gleichzeitig berücksichtigen. Solche zu bestimmen ist allerdings deutlich komplexer und es sind effektive Optimierungsalgorithmen notwendig um optimale Lösungen zu finden. Für numerische Merkmale sind Aufteilungen anhand affiner Hyperebenen eine Alternative zu univariaten Aufteilungen. Leider ist das Problem der optimalen Bestimmung der Hyperebenparameter im Allgemeinen NP-schwer. Inspiriert durch die zugrunde liegende Problemstruktur werden in dieser Arbeit daher zwei Heuristiken zur näherungsweisen Lösung dieses Problems entwickelt. Die erste ist eine Kreuzentropiemethode, die iterativ Stichproben von der von-Mises-Fisher-Verteilung zieht und deren Parameter mithilfe der besten Elemente daraus verbessert. Die zweite ist ein Simulated-Annealing-Verfahren, das eine Pivotstrategie zur Erkundung des Lösungsraums nutzt. Aufgrund der gleichzeitigen Verwendung aller numerischen Merkmale sind generelle Hyperebenenaufteilungen jedoch schwer zu interpretieren. Als Alternative wird in dieser Arbeit daher die Verwendung von bivariaten Hyperebenenaufteilungen vorgeschlagen, die Linien in dem von zwei Merkmalen aufgespannten Unterraum entsprechen. Mit diesen ist es möglich, den Merkmalsraum deutlich effizienter zu unterteilen als mit univariaten Aufteilungen. Gleichzeitig sind sie aufgrund der Beschränkung auf zwei Merkmale gut interpretierbar. Zur optimalen Bestimmung der bivariaten Hyperebenenaufteilungen wird ein Branch-and-Bound-Verfahren vorgestellt. Darüber hinaus wird ein Branch-and-Bound-Verfahren zur Bestimmung optimaler Kreuzaufteilungen entwickelt. Diese können als Kombination von zwei standardmäßigen univariaten Aufteilung betrachtet werden und sind in Situationen nützlich, in denen die Datenpunkte nur schlecht durch einzelne lineare Aufteilungen separiert werden können. Die entwickelten unteren Schranken für verunreinigungsbasierte Aufteilungskriterien motivieren ebenfalls ein einfaches, aber effektives Branch-and-Bound-Verfahren zur Bestimmung optimaler Aufteilungen nominaler Merkmale. Aufgrund der Komplexität des zugrunde liegenden Optimierungsproblems musste man bisher nominale Merkmale mittels Kodierungsschemata in numerische umwandeln oder Heuristiken nutzen, um suboptimale nominale Aufteilungen zu bestimmen. Das vorgeschlagene Branch-and-Bound-Verfahren bietet eine nützliche Alternative für viele praktische Anwendungsfälle. Schließlich wird ein genetischer Algorithmus zur Induktion von Entscheidungsbäumen als Alternative zur Top-Down-Methode vorgestellt.Decision trees are among the most commonly used machine learning models for solving classification and regression tasks due to their major advantage of being easy to interpret. However, their predictions are often not as accurate as those of other models. The most widely used approach for learning decision trees is to build them in a top-down manner by introducing splits on a single variable that minimize a certain splitting criterion. One possibility of improving this strategy to induce smaller and more accurate decision trees is to allow different types of splits which, for example, consider multiple features simultaneously. However, finding such splits is usually much more complex and effective optimization methods are needed to determine optimal solutions. An alternative to univarate splits for numerical features are oblique splits which employ affine hyperplanes to divide the feature space. Unfortunately, the problem of determining such a split optimally is known to be NP-hard in general. Inspired by the underlying problem structure, two new heuristics are developed for finding near-optimal oblique splits. The first one is a cross-entropy optimization method which iteratively samples points from the von Mises-Fisher distribution and updates its parameters based on the best performing samples. The second one is a simulated annealing algorithm that uses a pivoting strategy to explore the solution space. As general oblique splits employ all of the numerical features simultaneously, they are hard to interpret. As an alternative, in this thesis, the usage of bivariate oblique splits is proposed. These splits correspond to lines in the subspace spanned by two features. They are capable of dividing the feature space much more efficiently than univariate splits while also being fairly interpretable due to the restriction to two features only. A branch and bound method is presented to determine these bivariate oblique splits optimally. Furthermore, a branch and bound method to determine optimal cross-splits is presented. These splits can be viewed as combinations of two standard univariate splits on numeric attributes and they are useful in situations where the data points cannot be separated well linearly. The cross-splits can either be introduced directly to induce quaternary decision trees or, which is usually better, they can be used to provide a certain degree of foresight, in which case only the better of the two respective univariate splits is introduced. The developed lower bounds for impurity based splitting criteria also motivate a simple but effective branch and bound algorithm for splits on nominal features. Due to the complexity of determining such splits optimally when the number of possible values for the feature is large, one previously had to use encoding schemes to transform the nominal features into numerical ones or rely on heuristics to find near-optimal nominal splits. The proposed branch and bound method may be a viable alternative for many practical applications. Lastly, a genetic algorithm is proposed as an alternative to the top-down induction strategy

    evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R

    Get PDF
    Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. This paper describes the "evtree" package, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. Computationally intensive tasks are fully computed in C++ while the "partykit" (Hothorn and Zeileis 2011) package is leveraged for representing the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions. "evtree" is compared to "rpart" (Therneau and Atkinson 1997), the open-source CART implementation, and conditional inference trees ("ctree", Hothorn, Hornik, and Zeileis 2006). The usefulness of "evtree" is illustrated in a textbook customer classification task and a benchmark study of predictive accuracy in which "evtree" achieved at least similar and most of the time better results compared to the recursive algorithms "rpart" and "ctree".machine learning, classification trees, regression trees, evolutionary algorithms, R
    corecore