23 research outputs found

    Classification and Target Group Selection Based Upon Frequent Patterns

    Get PDF
    In this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is constructed. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The classification method utilizes directly this collection. Target group selection is a known problem in direct marketing. Our selection algorithm is based upon the collection of frequent patterns

    Repairing non-monotone ordinal data sets by changing class labels

    Get PDF
    __Abstract__ Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed

    Dilworth's Theorem Revisited, an Algorithmic Proof

    Get PDF
    Dilworth's theorem establishes a link between a minimal path cover and a maximal antichain in a digraph. A new proof for Dilworth's theorem is given. Moreover an algorithm to find both the path cover and the antichain, as considered in the theorem, is presented

    On the use of simple classifiers for the initialisation of one-hidden-layer neural nets

    Get PDF
    In this report we discuss the use of two simple classifiers to initialise the input-to-hidden layer of a one-hidden-layer neural network. These classifiers divide the input space in convex regions that can be represented by membership functions. These functions are then used to determine the weights of the first layer of a feedforward network

    Generating artificial data with monotonicity constraints

    Get PDF
    The monotonicity constraint is a common side condition imposed on modeling problems as diverse as hedonic pricing, personnel selection and credit rating. Experience tells us that it is not trivial to generate artificial data for supervised learning problems when the monotonicity constraint holds. Two algorithms are presented in this paper for such learning problems. The first one can be used to generate random monotone data sets without an underlying model, and the second can be used to generate monotone decision tree models. If needed, noise can be added to the generated data. The second algorithm makes use of the first one. Both algorithms are illustrated with an example

    Monotone Decision Trees

    Get PDF
    EUR-FEW-CS-97-07 Title Monotone decision trees Author(s) R. Potharst J.C. Bioch T. Petter Abstract In many classification problems the domains of the attributes and the classes are linearly ordered. Often, classification must preserve this ordering: this is called monotone classification. Since the known decision tree methods generate non-monotone trees, these methods are not suitable for monotone classification problems. In this report we provide a number of order-preserving tree-generation algorithms for multi-attribute classification problems with k linearly ordered classes

    Improved customer choice predictions using ensemble methods

    Get PDF
    In this paper various ensemble learning methods from machine learning and statistics are considered and applied to the customer choice modeling problem. The application of ensemble learning usually improves the prediction quality of flexible models like decision trees and thus leads to improved predictions. We give experimental results for two real-life marketing datasets using decision trees, ensemble versions of decision trees and the logistic regression model, which is a standard approach for this problem. The ensemble models are found to improve upon individual decision trees and outperform logistic regression. Next, an additive decomposition of the prediction error of a model, the bias/variance decomposition, is considered. A model with a high bias lacks the flexibility to fit the data well. A high variance indicates that a model is instable with respect to different datasets. Decision trees have a high variance component and a low bias component in the prediction error, whereas logistic regression has a high bias component and a low variance component. It is shown that ensemble methods aim at minimizing the variance component in the prediction error while leaving the bias component unaltered. Bias/variance decompositions for all models for both customer choice datasets are given to illustrate these concepts

    Neural Networks for Target Selection in Direct Marketing

    Get PDF
    Partly due to a growing interest in direct marketing, it has become an important application field for data mining. Many techniques have been applied to select the targets in commercial applications, such as statistical regression, regression trees, ne

    Pattern-Based Target Selection Applied to Fund Raising

    Get PDF
    This paper proposes a new algorithm for target selection. This algorithm collects all frequent patterns (equivalent to frequent item sets) in a training set. These patterns are stored e?ciently using a compact data structure called a trie. Fo

    Boosting the accuracy of hedonic pricing models

    Get PDF
    Hedonic pricing models attempt to model a relationship between object attributes and the object's price. Traditional hedonic pricing models are often parametric models that suffer from misspecification. In this paper we create these models by means of boosted CART models. The method is explained in detail and applied to various datasets. Empirically, we find substantial reduction of errors on out-of-sample data for two out of three datasets compared with a stepwise linear regression model. We interpret the boosted models by partial dependence plots and relative importance plots. This reveals some interesting nonlinearities and differences in attribute importance across the model types
    corecore