4,388 research outputs found

    Classification Trees for Problems with Monotonicity Constraints

    Get PDF
    For classification problems with ordinal attributes very often theclass attribute should increase with each or some of theexplaining attributes. These are called classification problemswith monotonicity constraints. Classical decision tree algorithmssuch as CART or C4.5 generally do not produce monotone trees, evenif the dataset is completely monotone. This paper surveys themethods that have so far been proposed for generating decisiontrees that satisfy monotonicity constraints. A distinction is madebetween methods that work only for monotone datasets and methodsthat work for monotone and non-monotone datasets alike.classification tree;decision tree;monotone;monotonicity constraint;ordinal data

    Derivation of Monotone Decision Models from Non-Monotone Data

    Get PDF
    The objective of data mining is the extraction of knowledge from databases. In practice, one often encounters difficulties with models that are constructed purely by search, without incorporation of knowledge about the domain of application.In economic decision making such as credit loan approval or risk analysis, one often requires models that are monotone with respect to the decision variables involved.If the model is obtained by a blind search through the data, it does mostly not have this property even if the underlying database is monotone.In this paper, we present methods to enforce monotonicity of decision models.We propose measures to express the degree of monotonicity of the data and an algorithm to make data sets monotone.In addition, it is shown that monotone decision trees derived from cleaned data perform better compared to trees derived from raw data.decision models;knowledge;decision theory;operational research;data mining

    Integrating Economic Knowledge in Data Mining Algorithms

    Get PDF
    The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this limitation can be removed by integrating knowledge from experts in the field, encoded in some accessible way, with knowledge derived form patterns stored in the database.In this paper we will in particular discuss methods for implementing monotonicity constraints in economic decision problems.This prior knowledge is combined with data mining algorithms based on decision trees and neural networks.The method is illustrated in a hedonic price model.knowledge;neural network;data mining;decision trees

    FPTAS for Counting Monotone CNF

    Full text link
    A monotone CNF formula is a Boolean formula in conjunctive normal form where each variable appears positively. We design a deterministic fully polynomial-time approximation scheme (FPTAS) for counting the number of satisfying assignments for a given monotone CNF formula when each variable appears in at most 55 clauses. Equivalently, this is also an FPTAS for counting set covers where each set contains at most 55 elements. If we allow variables to appear in a maximum of 66 clauses (or sets to contain 66 elements), it is NP-hard to approximate it. Thus, this gives a complete understanding of the approximability of counting for monotone CNF formulas. It is also an important step towards a complete characterization of the approximability for all bounded degree Boolean #CSP problems. In addition, we study the hypergraph matching problem, which arises naturally towards a complete classification of bounded degree Boolean #CSP problems, and show an FPTAS for counting 3D matchings of hypergraphs with maximum degree 44. Our main technique is correlation decay, a powerful tool to design deterministic FPTAS for counting problems defined by local constraints among a number of variables. All previous uses of this design technique fall into two categories: each constraint involves at most two variables, such as independent set, coloring, and spin systems in general; or each variable appears in at most two constraints, such as matching, edge cover, and holant problem in general. The CNF problems studied here have more complicated structures than these problems and require new design and proof techniques. As it turns out, the technique we developed for the CNF problem also works for the hypergraph matching problem. We believe that it may also find applications in other CSP or more general counting problems.Comment: 24 pages, 2 figures. version 1=>2: minor edits, highlighted the picture of set cover/packing, and an implication of our previous result in 3D matchin

    Axiomatic Interpretability for Multiclass Additive Models

    Full text link
    Generalized additive models (GAMs) are favored in many regression and binary classification problems because they are able to fit complex, nonlinear functions while still remaining interpretable. In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM learning algorithms and sometimes matches the performance of full complexity models such as gradient boosted trees. In the second part, we turn our attention to the interpretability of GAMs in the multiclass setting. Surprisingly, the natural interpretability of GAMs breaks down when there are more than two classes. Naive interpretation of multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we identify two axioms that any additive model must satisfy in order to not be visually misleading. We then develop a technique called Additive Post-Processing for Interpretability (API), that provably transforms a pre-trained additive model to satisfy the interpretability axioms without sacrificing accuracy. The technique works not just on models trained with our learning algorithm, but on any multiclass additive model, including multiclass linear and logistic regression. We demonstrate the effectiveness of API on a 12-class infant mortality dataset.Comment: KDD 201
    • …
    corecore