4,388 research outputs found
Classification Trees for Problems with Monotonicity Constraints
For classification problems with ordinal attributes very often theclass attribute should increase with each or some of theexplaining attributes. These are called classification problemswith monotonicity constraints. Classical decision tree algorithmssuch as CART or C4.5 generally do not produce monotone trees, evenif the dataset is completely monotone. This paper surveys themethods that have so far been proposed for generating decisiontrees that satisfy monotonicity constraints. A distinction is madebetween methods that work only for monotone datasets and methodsthat work for monotone and non-monotone datasets alike.classification tree;decision tree;monotone;monotonicity constraint;ordinal data
Derivation of Monotone Decision Models from Non-Monotone Data
The objective of data mining is the extraction of knowledge from databases. In practice, one often encounters difficulties with models that are constructed purely by search, without incorporation of knowledge about the domain of application.In economic decision making such as credit loan approval or risk analysis, one often requires models that are monotone with respect to the decision variables involved.If the model is obtained by a blind search through the data, it does mostly not have this property even if the underlying database is monotone.In this paper, we present methods to enforce monotonicity of decision models.We propose measures to express the degree of monotonicity of the data and an algorithm to make data sets monotone.In addition, it is shown that monotone decision trees derived from cleaned data perform better compared to trees derived from raw data.decision models;knowledge;decision theory;operational research;data mining
Integrating Economic Knowledge in Data Mining Algorithms
The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this limitation can be removed by integrating knowledge from experts in the field, encoded in some accessible way, with knowledge derived form patterns stored in the database.In this paper we will in particular discuss methods for implementing monotonicity constraints in economic decision problems.This prior knowledge is combined with data mining algorithms based on decision trees and neural networks.The method is illustrated in a hedonic price model.knowledge;neural network;data mining;decision trees
FPTAS for Counting Monotone CNF
A monotone CNF formula is a Boolean formula in conjunctive normal form where
each variable appears positively. We design a deterministic fully
polynomial-time approximation scheme (FPTAS) for counting the number of
satisfying assignments for a given monotone CNF formula when each variable
appears in at most clauses. Equivalently, this is also an FPTAS for
counting set covers where each set contains at most elements. If we allow
variables to appear in a maximum of clauses (or sets to contain
elements), it is NP-hard to approximate it. Thus, this gives a complete
understanding of the approximability of counting for monotone CNF formulas. It
is also an important step towards a complete characterization of the
approximability for all bounded degree Boolean #CSP problems. In addition, we
study the hypergraph matching problem, which arises naturally towards a
complete classification of bounded degree Boolean #CSP problems, and show an
FPTAS for counting 3D matchings of hypergraphs with maximum degree .
Our main technique is correlation decay, a powerful tool to design
deterministic FPTAS for counting problems defined by local constraints among a
number of variables. All previous uses of this design technique fall into two
categories: each constraint involves at most two variables, such as independent
set, coloring, and spin systems in general; or each variable appears in at most
two constraints, such as matching, edge cover, and holant problem in general.
The CNF problems studied here have more complicated structures than these
problems and require new design and proof techniques. As it turns out, the
technique we developed for the CNF problem also works for the hypergraph
matching problem. We believe that it may also find applications in other CSP or
more general counting problems.Comment: 24 pages, 2 figures. version 1=>2: minor edits, highlighted the
picture of set cover/packing, and an implication of our previous result in 3D
matchin
Axiomatic Interpretability for Multiclass Additive Models
Generalized additive models (GAMs) are favored in many regression and binary
classification problems because they are able to fit complex, nonlinear
functions while still remaining interpretable. In the first part of this paper,
we generalize a state-of-the-art GAM learning algorithm based on boosted trees
to the multiclass setting, and show that this multiclass algorithm outperforms
existing GAM learning algorithms and sometimes matches the performance of full
complexity models such as gradient boosted trees.
In the second part, we turn our attention to the interpretability of GAMs in
the multiclass setting. Surprisingly, the natural interpretability of GAMs
breaks down when there are more than two classes. Naive interpretation of
multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we
identify two axioms that any additive model must satisfy in order to not be
visually misleading. We then develop a technique called Additive
Post-Processing for Interpretability (API), that provably transforms a
pre-trained additive model to satisfy the interpretability axioms without
sacrificing accuracy. The technique works not just on models trained with our
learning algorithm, but on any multiclass additive model, including multiclass
linear and logistic regression. We demonstrate the effectiveness of API on a
12-class infant mortality dataset.Comment: KDD 201
- …