12 research outputs found
Hierarchical Priors for Bayesian CART Shrinkage
The Bayesian CART (classification and regression tree) approach proposed by Chipman, George and McCulloch (1998) entails putting a prior distribution on the set of all CART models and then using stochastic search to select a model. The main thrust of this paper is to propose a new class of hierarchical priors which enhance the potential of this Bayesian approach. These priors indicate a preference for smooth local mean structure, resulting in tree models which shrink predictions from adjacent terminal node towards each other. Past methods for tree shrinkage have searched for trees without shrinking, and applied shrinkage to the identified tree only after the search. By using hierarchical priors in the stochastic search, the proposed method searches for shrunk trees that fit well and improves the tree through shrinkage of predictions
Classic and Bayesian Tree-Based Methods
Tree-based methods are nonparametric techniques and machine-learning methods for data prediction and exploratory modeling. These models are one of valuable and powerful tools among data mining methods and can be used for predicting different types of outcome (dependent) variable: (e.g., quantitative, qualitative, and time until an event occurs (survival data)). Tree model is called classification tree/regression tree/survival tree based on the type of outcome variable. These methods have some advantages over against traditional statistical methods such as generalized linear models (GLMs), discriminant analysis, and survival analysis. Some of these advantages are: without requiring to determine assumptions about the functional form between outcome variable and predictor (independent) variables, invariant to monotone transformations of predictor variables, useful for dealing with nonlinear relationships and high-order interactions, deal with different types of predictor variable, ease of interpretation and understanding results without requiring to have statistical experience, robust to missing values, outliers, and multicollinearity. Several classic and Bayesian tree algorithms are proposed for classification and regression trees, and in this chapter, we provide a review of these algorithms and appropriate criteria for determining the predictive performance of them
Online Learning with Bayesian Classification Trees
Randomized classification trees are among the most popular machine learning tools and found successful applications in many areas. Although this classifier was originally designed as offline learning algorithm, there has been an increased interest in the last years to provide an online variant. In this paper, we propose an online learning algorithm for classification trees that adheres to Bayesian principles. In contrast to state-of-the-art approaches that produce large forests with complex trees, we aim at constructing small ensembles consisting of shallow trees with high generalization capabilities. Experiments on benchmark machine learning and body part recognition datasets show superior performance over state-of-the-art approaches
Bayesian CART models for insurance claims frequency
Accuracy and interpretability of a (non-life) insurance pricing model are
essential qualities to ensure fair and transparent premiums for policy-holders,
that reflect their risk. In recent years, the classification and regression
trees (CARTs) and their ensembles have gained popularity in the actuarial
literature, since they offer good prediction performance and are relatively
easily interpretable. In this paper, we introduce Bayesian CART models for
insurance pricing, with a particular focus on claims frequency modelling.
Additionally to the common Poisson and negative binomial (NB) distributions
used for claims frequency, we implement Bayesian CART for the zero-inflated
Poisson (ZIP) distribution to address the difficulty arising from the
imbalanced insurance claims data. To this end, we introduce a general MCMC
algorithm using data augmentation methods for posterior tree exploration. We
also introduce the deviance information criterion (DIC) for the tree model
selection. The proposed models are able to identify trees which can better
classify the policy-holders into risk groups. Some simulations and real
insurance data will be discussed to illustrate the applicability of these
models.Comment: 46 page
Bayesian CART models for insurance claims frequency
The accuracy and interpretability of a (non-life) insurance pricing model are essential qualities to ensure fair and transparent premiums for policy-holders, that reflect their risk. In recent years, classification and regression trees (CARTs) and their ensembles have gained popularity in the actuarial literature, since they offer good prediction performance and are relatively easy to interpret. In this paper, we introduce Bayesian CART models for insurance pricing, with a particular focus on claims frequency modelling. In addition to the common Poisson and negative binomial (NB) distributions used for claims frequency, we implement Bayesian CART for the zero-inflated Poisson (ZIP) distribution to address the difficulty arising from the imbalanced insurance claims data. To this end, we introduce a general MCMC algorithm using data augmentation methods for posterior tree exploration. We also introduce the deviance information criterion (DIC) for tree model selection. The proposed models are able to identify trees which can better classify the policy-holders into risk groups. Simulations and real insurance data will be used to illustrate the applicability of these models
Regional-scale eutrophication models: a Bayesian treed model approach
Utilizing Bayesian hierarchical techniques, regional-scale eutrophication models were developed for use in the Total Maximum Daily Load (TMDL) process. The Bayesian tree-based (BTREED) approach allows association of multiple environmental stressors with biological responses, and quantification of uncertainty sources in the water quality model. Simple parametric models are often inadequate for describing complex datasets; the BTREED approach partitions the dataset, and describes the localized subsets of data with linear models, thereby providing a comprehensive representation of stressor and response interactions. Nutrient criteria data for lakes, ponds and reservoirs across the United States were obtained from the Environmental Protection Agency (U.S. EPA) National Nutrient Criteria Database. Model estimation was accomplished by randomly splitting the composite dataset into training and test sets, and using the training dataset in model estimation, and the test dataset to evaluate and validate the model. Mean squared error was reported for both training and test data of the highest log-likelihood models. The Bayesian approach to regional-scale eutrophication models is also beneficial from a decision-theoretic perspective. Predictions regarding the variable of interest are quantified by probability distributions, providing the decision maker with valuable information about the distribution of the biological response conditional on the stressors, and information about the model error
Recommended from our members
Effective techniques for handling incomplete data using decision trees
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete.
In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty.
The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions.
The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity.
The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets.
Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy
Decision trees and forests: a probabilistic perspective
Decision trees and ensembles of decision trees are very popular in machine learning and often achieve state-of-the-art performance on black-box prediction tasks. However, popular variants such as C4.5, CART, boosted trees and random forests lack a probabilistic interpretation since they usually just specify an algorithm for training a model. We take a probabilistic approach where we cast the decision tree structures and the parameters associated with the nodes of a decision tree as a probabilistic model; given labeled examples, we can train the probabilistic model using a variety of approaches (Bayesian learning, maximum likelihood, etc). The probabilistic approach allows us to encode prior assumptions about tree structures and share statistical strength between node parameters; furthermore, it offers a principled mechanism to obtain probabilistic predictions which is crucial for applications where uncertainty quantification is important. Existing work on Bayesian decision trees relies on Markov chain Monte Carlo which can be computationally slow and suffer from poor mixing. We propose a novel sequential Monte Carlo algorithm that computes a particle approximation to the posterior over trees in a top-down fashion. We also propose a novel sampler for Bayesian additive regression trees by combining the above top-down particle filtering algorithm with the Particle Gibbs (Andrieu et al., 2010) framework. Finally, we propose Mondrian forests (MFs), a computationally efficient hybrid solution that is competitive with non-probabilistic counterparts in terms of speed and accuracy, but additionally produces well-calibrated uncertainty estimates. MFs use the Mondrian process (Roy and Teh, 2009) as the randomization mechanism and hierarchically smooth the node parameters within each tree (using a hierarchical probabilistic model and approximate Bayesian updates), but combine the trees in a non-Bayesian fashion. MFs can be grown in an incremental/online fashion and remarkably, the distribution of online MFs is the same as that of batch MFs