73,060 research outputs found
Bayesian Neural Tree Models for Nonparametric Regression
Frequentist and Bayesian methods differ in many aspects, but share some basic
optimal properties. In real-life classification and regression problems,
situations exist in which a model based on one of the methods is preferable
based on some subjective criterion. Nonparametric classification and regression
techniques, such as decision trees and neural networks, have frequentist
(classification and regression trees (CART) and artificial neural networks) as
well as Bayesian (Bayesian CART and Bayesian neural networks) approaches to
learning from data. In this work, we present two hybrid models combining the
Bayesian and frequentist versions of CART and neural networks, which we call
the Bayesian neural tree (BNT) models. Both models exploit the architecture of
decision trees and have lesser number of parameters to tune than advanced
neural networks. Such models can simultaneously perform feature selection and
prediction, are highly flexible, and generalize well in settings with a limited
number of training observations. We study the consistency of the proposed
models, and derive the optimal value of an important model parameter. We also
provide illustrative examples using a wide variety of real-life regression data
sets
Massively-Parallel Feature Selection for Big Data
We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for
feature selection (FS) in Big Data settings (high dimensionality and/or sample
size). To tackle the challenges of Big Data FS PFBP partitions the data matrix
both in terms of rows (samples, training examples) as well as columns
(features). By employing the concepts of -values of conditional independence
tests and meta-analysis techniques PFBP manages to rely only on computations
local to a partition while minimizing communication costs. Then, it employs
powerful and safe (asymptotically sound) heuristics to make early, approximate
decisions, such as Early Dropping of features from consideration in subsequent
iterations, Early Stopping of consideration of features within the same
iteration, or Early Return of the winner in each iteration. PFBP provides
asymptotic guarantees of optimality for data distributions faithfully
representable by a causal network (Bayesian network or maximal ancestral
graph). Our empirical analysis confirms a super-linear speedup of the algorithm
with increasing sample size, linear scalability with respect to the number of
features and processing cores, while dominating other competitive algorithms in
its class
Bioinformatics tools in predictive ecology: Applications to fisheries
This article is made available throught the Brunel Open Access Publishing Fund - Copygith @ 2012 Tucker et al.There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse
- …