6,384 research outputs found
Particle Gibbs for Bayesian Additive Regression Trees
Additive regression trees are flexible non-parametric models and popular
off-the-shelf tools for real-world non-linear regression. In application
domains, such as bioinformatics, where there is also demand for probabilistic
predictions with measures of uncertainty, the Bayesian additive regression
trees (BART) model, introduced by Chipman et al. (2010), is increasingly
popular. As data sets have grown in size, however, the standard
Metropolis-Hastings algorithms used to perform inference in BART are proving
inadequate. In particular, these Markov chains make local changes to the trees
and suffer from slow mixing when the data are high-dimensional or the best
fitting trees are more than a few layers deep. We present a novel sampler for
BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a
top-down particle filtering algorithm for Bayesian decision trees
(Lakshminarayanan et al., 2013). Rather than making local changes to individual
trees, the PG sampler proposes a complete tree to fit the residual. Experiments
show that the PG sampler outperforms existing samplers in many settings
Decision trees and forests: a probabilistic perspective
Decision trees and ensembles of decision trees are very popular in machine learning and often achieve state-of-the-art performance on black-box prediction tasks. However, popular variants such as C4.5, CART, boosted trees and random forests lack a probabilistic interpretation since they usually just specify an algorithm for training a model. We take a probabilistic approach where we cast the decision tree structures and the parameters associated with the nodes of a decision tree as a probabilistic model; given labeled examples, we can train the probabilistic model using a variety of approaches (Bayesian learning, maximum likelihood, etc). The probabilistic approach allows us to encode prior assumptions about tree structures and share statistical strength between node parameters; furthermore, it offers a principled mechanism to obtain probabilistic predictions which is crucial for applications where uncertainty quantification is important. Existing work on Bayesian decision trees relies on Markov chain Monte Carlo which can be computationally slow and suffer from poor mixing. We propose a novel sequential Monte Carlo algorithm that computes a particle approximation to the posterior over trees in a top-down fashion. We also propose a novel sampler for Bayesian additive regression trees by combining the above top-down particle filtering algorithm with the Particle Gibbs (Andrieu et al., 2010) framework. Finally, we propose Mondrian forests (MFs), a computationally efficient hybrid solution that is competitive with non-probabilistic counterparts in terms of speed and accuracy, but additionally produces well-calibrated uncertainty estimates. MFs use the Mondrian process (Roy and Teh, 2009) as the randomization mechanism and hierarchically smooth the node parameters within each tree (using a hierarchical probabilistic model and approximate Bayesian updates), but combine the trees in a non-Bayesian fashion. MFs can be grown in an incremental/online fashion and remarkably, the distribution of online MFs is the same as that of batch MFs
Statistical Function Tagging and Grammatical Relations of Myanmar Sentences
This paper describes a context free grammar (CFG) based grammatical relations
for Myanmar sentences which combine corpus-based function tagging system. Part
of the challenge of statistical function tagging for Myanmar sentences comes
from the fact that Myanmar has free-phrase-order and a complex morphological
system. Function tagging is a pre-processing step to show grammatical relations
of Myanmar sentences. In the task of function tagging, which tags the function
of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging
and chunking information, we use Naive Bayesian theory to disambiguate the
possible function tags of a word. We apply context free grammar (CFG) to find
out the grammatical relations of the function tags. We also create a functional
annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar
sentences. Experiments show that our analysis achieves a good result with
simple sentences and complex sentences.Comment: 16 pages, 7 figures, 8 tables, AIAA-2011 (India). arXiv admin note:
text overlap with arXiv:0912.1820 by other author
- …