67,278 research outputs found

    A System for Induction of Oblique Decision Trees

    Full text link
    This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    Rule-based Machine Learning Methods for Functional Prediction

    Full text link
    We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

    Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules

    Full text link
    We describe a case study in the application of {\em symbolic machine learning} techniques for the discovery of linguistic rules and categories. A supervised rule induction algorithm is used to learn to predict the correct diminutive suffix given the phonological representation of Dutch nouns. The system produces rules which are comparable to rules proposed by linguists. Furthermore, in the process of learning this morphological task, the phonemes used are grouped into phonologically relevant categories. We discuss the relevance of our method for linguistics and language technology

    Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

    Full text link
    This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming
    • …
    corecore