15,527 research outputs found

    Ensembles of probability estimation trees for customer churn prediction

    Get PDF
    Customer churn prediction is one of the most, important elements tents of a company's Customer Relationship Management, (CRM) strategy In tins study, two strategies are investigated to increase the lift. performance of ensemble classification models, i.e (1) using probability estimation trees (PETs) instead of standard decision trees as base classifiers; and (n) implementing alternative fusion rules based on lift weights lot the combination of ensemble member's outputs Experiments ale conducted lot font popular ensemble strategics on five real-life chin n data sets In general, the results demonstrate how lift performance can be substantially improved by using alternative base classifiers and fusion tides However: the effect vanes lot the (Idol cut ensemble strategies lit particular, the results indicate an increase of lift performance of (1) Bagging by implementing C4 4 base classifiets. (n) the Random Subspace Method (RSM) by using lift-weighted fusion rules, and (in) AdaBoost, by implementing both

    Vote-boosting ensembles

    Full text link
    Vote-boosting is a sequential ensemble learning method in which the individual classifiers are built on different weighted versions of the training data. To build a new classifier, the weight of each training instance is determined in terms of the degree of disagreement among the current ensemble predictions for that instance. For low class-label noise levels, especially when simple base learners are used, emphasis should be made on instances for which the disagreement rate is high. When more flexible classifiers are used and as the noise level increases, the emphasis on these uncertain instances should be reduced. In fact, at sufficiently high levels of class-label noise, the focus should be on instances on which the ensemble classifiers agree. The optimal type of emphasis can be automatically determined using cross-validation. An extensive empirical analysis using the beta distribution as emphasis function illustrates that vote-boosting is an effective method to generate ensembles that are both accurate and robust

    Random Prism: An Alternative to Random Forests.

    Get PDF
    Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting

    Optimization of Signal Significance by Bagging Decision Trees

    Get PDF
    An algorithm for optimization of signal significance or any other classification figure of merit suited for analysis of high energy physics (HEP) data is described. This algorithm trains decision trees on many bootstrap replicas of training data with each tree required to optimize the signal significance or any other chosen figure of merit. New data are then classified by a simple majority vote of the built trees. The performance of this algorithm has been studied using a search for the radiative leptonic decay B->gamma l nu at BaBar and shown to be superior to that of all other attempted classifiers including such powerful methods as boosted decision trees. In the B->gamma e nu channel, the described algorithm increases the expected signal significance from 2.4 sigma obtained by an original method designed for the B->gamma l nu analysis to 3.0 sigma.Comment: 8 pages, 2 figures, 1 tabl

    Improving adaptive bagging methods for evolving data streams

    Get PDF
    We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of different sizes, and ADWIN Bagging uses ADWIN as a change detector to decide when to discard underperforming ensemble members. We improve ADWIN Bagging using Hoeffding Adaptive Trees, trees that can adaptively learn from data streams that change over time. To speed up the time for adapting to change of Adaptive-Size Hoeffding Tree (ASHT) Bagging, we add an error change detector for each classifier. We test our improvements by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    Get PDF
    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dF Galaxy Redshift and 2dF QSO Redshift (2QZ) surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r ~ 18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r ~ 20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars, and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars. [Abridged]Comment: 27 pages, 12 figures, to be published in ApJ, uses emulateapj.cl

    CSNL: A cost-sensitive non-linear decision tree algorithm

    Get PDF
    This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification. The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes. The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes
    corecore