9 research outputs found
Recommended from our members
A very simple safe-Bayesian random forest
Random forests works by averaging several predictions of de-correlated trees. We show a conceptually radical approach to generate a random forest: random sampling of many trees from a prior distribution, and subsequently performing a weighted ensemble of predictive probabilities. Our approach uses priors that allow sampling of decision trees even before looking at the data, and a power likelihood that explores the space spanned by combination of decision trees. While each tree performs Bayesian inference to compute its predictions, our aggregation procedure uses the power likelihood rather than the likelihood and is therefore strictly speaking not Bayesian. Nonetheless, we refer to it as a Bayesian random forest but with a built-in safety. The safeness comes as it has good predictive performance even if the underlying probabilistic model is wrong. We demonstrate empirically that our Safe-Bayesian random forest outperforms MCMC or SMC based Bayesian decision trees in term of speed and accuracy, and achieves competitive performance to entropy or Gini optimised random forest, yet is very simple to construct
Recommended from our members
Scalable Gaussian process structured prediction for grid factor graph applications
Structured prediction is an important and well studied problem with many applications across machine learning. GPstruct is a recently proposed structured prediction model that offers appealing properties such as being kernelised, non-parametric, and supporting Bayesian inference (Bratières et al. 2013). The model places a Gaussian process prior over energy functions which describe relationships between input variables and structured output variables. However, the memory demand of GPstruct is quadratic in the number of latent variables and training runtime scales cubically. This prevents GPstruct from being applied to problems involving grid factor graphs, which are prevalent in computer vision and spatial statistics applications. Here we explore a scalable approach to learning GPstruct models based on ensemble learning, with weak learners (predictors) trained on subsets of the latent variables and bootstrap data, which can easily be distributed. We show experiments with 4M latent variables on image segmentation. Our method outperforms widely-used conditional random field models trained with pseudo-likelihood. Moreover, in image segmentation problems it improves over recent state-of-the-art marginal optimisation methods in terms of predictive performance and uncertainty calibration. Finally, it generalises well on all training set sizes
Run-times for different values of , analysing the yeast microarray data set.
<p>Each point is the average of 10 runs, with the error bars denoting the standard error on the mean. The horizontal dashed line shows the results for the full BHC method.</p
Adjusted Rand index scores for different values of , analysing the synthetic data set.
<p>Each point is the average of 10 runs, with the error bars denoting the standard error on the mean. The horizontal dashed line shows the result for the full BHC method.</p
BHI scores for difference values of , analysing the yeast microarray data set.
<p>Each point is the average of 10 runs, with the error bars denoting the standard error on the mean. The horizontal dashed line shows the results for the full BHC method. Shown are the results for the different gene ontologies, Biological Process (red), Molecular Function (green), Cellular Component (blue) and the logical-OR of all three (black). The BHI scores were all generated using the org.Sc.sgd.db annotation R package.</p
Run-times for different values of , analysing the synthetic data set.
<p>Each point is the average of 10 runs, with the error bars denoting the standard error on the mean. The horizontal dashed line shows the result for the full BHC method.</p
Flow chart showing the randomised BHC algorithm.
<p>The main loop is the randomised part of the algorithm, which is used recursively until the remaining gene subsets are small enough that it uses the greedy version of BHC to complete the tree and then terminates.</p
Speed up factor as a function of the number of genes, , relative to the full BHC method, using (subsets of) the synthetic data.
<p>Shown are the results for (red), (green) and (blue). The horizontal dashed line shows the full BHC result.</p
Run-time as a function of the number of genes, , using (subsets of) the synthetic data.
<p>Shown are the results for (red), (green) and (blue), as well as for the full BHC method (black).</p