Search CORE

23 research outputs found

Classification and Target Group Selection Based Upon Frequent Patterns

Author: Pijls W.H.L.M. (Wim)
Potharst R. (Rob)
Publication venue: Pijls, W.H.L.M. (Wim)
Publication date: 01/01/2000
Field of study

In this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is constructed. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The classification method utilizes directly this collection. Target group selection is a known problem in direct marketing. Our selection algorithm is based upon the collection of frequent patterns

EUR Research Repository

Erasmus University Digital Repository

Repairing non-monotone ordinal data sets by changing class labels

Author: Pijls W.H.L.M. (Wim)
Potharst R. (Rob)
Publication venue: __Abstract__ Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed.
Publication date: 01/01/2014
Field of study

__Abstract__ Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed

EUR Research Repository

Erasmus University Digital Repository

Dilworth's Theorem Revisited, an Algorithmic Proof

Author: Pijls W.H.L.M. (Wim)
Potharst R. (Rob)
Publication venue: Pijls, W.H.L.M. (Wim)
Publication date: 01/01/2011
Field of study

Dilworth's theorem establishes a link between a minimal path cover and a maximal antichain in a digraph. A new proof for Dilworth's theorem is given. Moreover an algorithm to find both the path cover and the antichain, as considered in the theorem, is presented

EUR Research Repository

Erasmus University Digital Repository

On the use of simple classifiers for the initialisation of one-hidden-layer neural nets

Author: Bioch J.C. (Cor)
Carsouw R.
Potharst R. (Rob)
Publication venue
Publication date: 01/01/1995
Field of study

In this report we discuss the use of two simple classifiers to initialise the input-to-hidden layer of a one-hidden-layer neural network. These classifiers divide the input space in convex regions that can be represented by membership functions. These functions are then used to determine the weights of the first layer of a feedforward network

CiteSeerX

EUR Research Repository

Erasmus University Digital Repository

Generating artificial data with monotonicity constraints

Author: Potharst R. (Rob)
Wezel M.C. (Michiel) van
Publication venue
Publication date: 01/01/2005
Field of study

The monotonicity constraint is a common side condition imposed on modeling problems as diverse as hedonic pricing, personnel selection and credit rating. Experience tells us that it is not trivial to generate artificial data for supervised learning problems when the monotonicity constraint holds. Two algorithms are presented in this paper for such learning problems. The first one can be used to generate random monotone data sets without an underlying model, and the second can be used to generate monotone decision tree models. If needed, noise can be added to the generated data. The second algorithm makes use of the first one. Both algorithms are illustrated with an example

EUR Research Repository

Erasmus University Digital Repository

Monotone Decision Trees

Author: Bioch J.C. (Cor)
Petter T.
Potharst R. (Rob)
Publication venue
Publication date: 01/01/1997
Field of study

EUR-FEW-CS-97-07 Title Monotone decision trees Author(s) R. Potharst J.C. Bioch T. Petter Abstract In many classification problems the domains of the attributes and the classes are linearly ordered. Often, classification must preserve this ordering: this is called monotone classification. Since the known decision tree methods generate non-monotone trees, these methods are not suitable for monotone classification problems. In this report we provide a number of order-preserving tree-generation algorithms for multi-attribute classification problems with k linearly ordered classes

EUR Research Repository

Erasmus University Digital Repository

Improved customer choice predictions using ensemble methods

Author: Potharst R. (Rob)
Wezel M.C. (Michiel) van
Publication venue
Publication date: 01/01/2005
Field of study

In this paper various ensemble learning methods from machine learning and statistics are considered and applied to the customer choice modeling problem. The application of ensemble learning usually improves the prediction quality of flexible models like decision trees and thus leads to improved predictions. We give experimental results for two real-life marketing datasets using decision trees, ensemble versions of decision trees and the logistic regression model, which is a standard approach for this problem. The ensemble models are found to improve upon individual decision trees and outperform logistic regression. Next, an additive decomposition of the prediction error of a model, the bias/variance decomposition, is considered. A model with a high bias lacks the flexibility to fit the data well. A high variance indicates that a model is instable with respect to different datasets. Decision trees have a high variance component and a low bias component in the prediction error, whereas logistic regression has a high bias component and a low variance component. It is shown that ensemble methods aim at minimizing the variance component in the prediction error while leaving the bias component unaltered. Bias/variance decompositions for all models for both customer choice datasets are given to illustrate these concepts

EUR Research Repository

Erasmus University Digital Repository

Neural Networks for Target Selection in Direct Marketing

Author: Kaymak U. (Uzay)
Pijls W.H.L.M. (Wim)
Potharst R. (Rob)
Publication venue: Potharst, R. (Rob)
Publication date: 01/01/2001
Field of study

Partly due to a growing interest in direct marketing, it has become an important application field for data mining. Many techniques have been applied to select the targets in commercial applications, such as statistical regression, regression trees, ne

CiteSeerX

Repository TU/e

Pure OAI Repository

EUR Research Repository

Erasmus University Digital Repository

Pattern-Based Target Selection Applied to Fund Raising

Author: Kaymak U. (Uzay)
Pijls W.H.L.M. (Wim)
Potharst R. (Rob)
Publication venue: Pijls, W.H.L.M. (Wim)
Publication date: 01/01/2001
Field of study

This paper proposes a new algorithm for target selection. This algorithm collects all frequent patterns (equivalent to frequent item sets) in a training set. These patterns are stored e?ciently using a compact data structure called a trie. Fo

Repository TU/e

EUR Research Repository

Erasmus University Digital Repository

Boosting the accuracy of hedonic pricing models

Author: Kagie M. (Martijn)
Potharst R. (Rob)
Wezel M.C. (Michiel) van
Publication venue
Publication date: 01/01/2005
Field of study

Hedonic pricing models attempt to model a relationship between object attributes and the object's price. Traditional hedonic pricing models are often parametric models that suffer from misspecification. In this paper we create these models by means of boosted CART models. The method is explained in detail and applied to various datasets. Empirically, we find substantial reduction of errors on out-of-sample data for two out of three datasets compared with a stepwise linear regression model. We interpret the boosted models by partial dependence plots and relative importance plots. This reveals some interesting nonlinearities and differences in attribute importance across the model types

CiteSeerX

EUR Research Repository

Erasmus University Digital Repository