23 research outputs found
Classification and Target Group Selection Based Upon Frequent Patterns
In this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is constructed. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The classification method utilizes directly this collection. Target group selection is a known problem in direct marketing. Our selection algorithm is based upon the collection of frequent patterns
Repairing non-monotone ordinal data sets by changing class labels
__Abstract__
Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first
one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed
Dilworth's Theorem Revisited, an Algorithmic Proof
Dilworth's theorem establishes a link between a minimal path cover and a maximal antichain in a digraph.
A new proof for Dilworth's theorem is given. Moreover an algorithm to find both the path cover and the antichain, as considered in the theorem, is presented
On the use of simple classifiers for the initialisation of one-hidden-layer neural nets
In this report we discuss the use of two simple classifiers to initialise the input-to-hidden layer of a one-hidden-layer neural network. These classifiers divide the input space in convex regions that can be represented by membership functions. These functions are then used to determine the weights of the first layer of a feedforward network
Generating artificial data with monotonicity constraints
The monotonicity constraint is a common side condition imposed on
modeling problems as diverse as hedonic pricing, personnel
selection and credit rating. Experience tells us that it is not
trivial to generate artificial data for supervised learning
problems when the monotonicity constraint holds. Two algorithms
are presented in this paper for such learning problems. The first
one can be used to generate random monotone data sets without an
underlying model, and the second can be used to generate monotone
decision tree models. If needed, noise can be added to the
generated data. The second algorithm makes use of the first one.
Both algorithms are illustrated with an example
Monotone Decision Trees
EUR-FEW-CS-97-07 Title Monotone decision trees Author(s) R. Potharst J.C. Bioch T. Petter Abstract In many classification problems the domains of the attributes and the classes are linearly ordered. Often, classification must preserve this ordering: this is called monotone classification. Since the known decision tree methods generate non-monotone trees, these methods are not suitable for monotone classification problems. In this report we provide a number of order-preserving tree-generation algorithms for multi-attribute classification problems with k linearly ordered classes
Improved customer choice predictions using ensemble methods
In this paper various ensemble learning methods from machine
learning and statistics are considered and applied to the customer
choice modeling problem. The application of ensemble learning
usually improves the prediction quality of flexible models like
decision trees and thus leads to improved predictions. We give
experimental results for two real-life marketing datasets using
decision trees, ensemble versions of decision trees and the
logistic regression model, which is a standard approach for this
problem. The ensemble models are found to improve upon individual
decision trees and outperform logistic regression.
Next, an additive decomposition of the prediction error of a
model, the bias/variance decomposition, is considered. A model
with a high bias lacks the flexibility to fit the data well. A
high variance indicates that a model is instable with respect to
different datasets. Decision trees have a high variance component
and a low bias component in the prediction error, whereas logistic
regression has a high bias component and a low variance component.
It is shown that ensemble methods aim at minimizing the variance
component in the prediction error while leaving the bias component
unaltered. Bias/variance decompositions for all models for both
customer choice datasets are given to illustrate these concepts
Neural Networks for Target Selection in Direct Marketing
Partly due to a growing interest in direct marketing, it has become an important application field for data mining. Many techniques have been applied to select the targets in commercial applications, such as statistical regression, regression trees, ne
Pattern-Based Target Selection Applied to Fund Raising
This paper proposes a new algorithm for target selection. This
algorithm collects all frequent patterns (equivalent to frequent item
sets) in a training set. These patterns are stored e?ciently using a
compact data structure called a trie. Fo
Boosting the accuracy of hedonic pricing models
Hedonic pricing models attempt to model a relationship between object attributes and
the object's price. Traditional hedonic pricing models are often parametric models that suffer
from misspecification. In this paper we create these models by means of boosted CART
models. The method is explained in detail and applied to various datasets. Empirically,
we find substantial reduction of errors on out-of-sample data for two out of three datasets
compared with a stepwise linear regression model. We interpret the boosted models by partial
dependence plots and relative importance plots. This reveals some interesting nonlinearities
and differences in attribute importance across the model types