2,437 research outputs found
Random model trees: an effective and scalable regression method
We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivaling the state of the art in numeric prediction. An extensive empirical investigation shows that Random Model Trees produce predictive performance which is competitive with state-of-the-art methods like Gaussian Processes Regression or Additive Groves of Regression Trees. The training
and optimization of Random Model Trees scales better than Gaussian Processes Regression to larger datasets, and enjoys a constant advantage over Additive Groves of the order of one to two orders of magnitude
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Particle Gibbs for Bayesian Additive Regression Trees
Additive regression trees are flexible non-parametric models and popular
off-the-shelf tools for real-world non-linear regression. In application
domains, such as bioinformatics, where there is also demand for probabilistic
predictions with measures of uncertainty, the Bayesian additive regression
trees (BART) model, introduced by Chipman et al. (2010), is increasingly
popular. As data sets have grown in size, however, the standard
Metropolis-Hastings algorithms used to perform inference in BART are proving
inadequate. In particular, these Markov chains make local changes to the trees
and suffer from slow mixing when the data are high-dimensional or the best
fitting trees are more than a few layers deep. We present a novel sampler for
BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a
top-down particle filtering algorithm for Bayesian decision trees
(Lakshminarayanan et al., 2013). Rather than making local changes to individual
trees, the PG sampler proposes a complete tree to fit the residual. Experiments
show that the PG sampler outperforms existing samplers in many settings
Sample Efficient Policy Search for Optimal Stopping Domains
Optimal stopping problems consider the question of deciding when to stop an
observation-generating process in order to maximize a return. We examine the
problem of simultaneously learning and planning in such domains, when data is
collected directly from the environment. We propose GFSE, a simple and flexible
model-free policy search method that reuses data for sample efficiency by
leveraging problem structure. We bound the sample complexity of our approach to
guarantee uniform convergence of policy value estimates, tightening existing
PAC bounds to achieve logarithmic dependence on horizon length for our setting.
We also examine the benefit of our method against prevalent model-based and
model-free approaches on 3 domains taken from diverse fields.Comment: To appear in IJCAI-201
Regression Tree Predictive Filter
Many algorithms have been developed to predict future samples of a signal. These algorithms, such as the recursive least squares predictive filter, rely on the assumption that the system generating the signal can be modeled as a linear system of equations. These systems perform poorly when used to predict signals generated by non-linear systems. To predict a non-linear signal, non-linear methods must be used. Regression trees are a simple form of machine learning that is non-linear in nature and can predict output based on a set of given input. The goal of this capstone project was to develop an algorithm for a regression trees predictive filter capable of predicting a non-linear signa. As this capstone was also an engineering design project it was also the goal to have the algorithm be a part of software system capable of allowing the parameters of the algorithm to be changed for testing. This paper details how the algorithm was developed as well as its results. It was found that using certain non-linear input signals that the regression trees predictive filter performed better at predicting than a traditional linear predictive filter. It was also shown that the regression trees predictive filter was able to adapt to a non-linear signal generated by a changing system. In testing on the changing non-linear signal, the filter was compared to a system which reset its prediction model rather than adapt it like the regression trees predictive filter. The regression trees predictive filter had better performance than this resetting system. This shows that the regression trees predictive filter can adapt to a system in such a way that it learned from it
A Survey on Approximation Mechanism Design without Money for Facility Games
In a facility game one or more facilities are placed in a metric space to
serve a set of selfish agents whose addresses are their private information. In
a classical facility game, each agent wants to be as close to a facility as
possible, and the cost of an agent can be defined as the distance between her
location and the closest facility. In an obnoxious facility game, each agent
wants to be far away from all facilities, and her utility is the distance from
her location to the facility set. The objective of each agent is to minimize
her cost or maximize her utility. An agent may lie if, by doing so, more
benefit can be obtained. We are interested in social choice mechanisms that do
not utilize payments. The game designer aims at a mechanism that is
strategy-proof, in the sense that any agent cannot benefit by misreporting her
address, or, even better, group strategy-proof, in the sense that any coalition
of agents cannot all benefit by lying. Meanwhile, it is desirable to have the
mechanism to be approximately optimal with respect to a chosen objective
function. Several models for such approximation mechanism design without money
for facility games have been proposed. In this paper we briefly review these
models and related results for both deterministic and randomized mechanisms,
and meanwhile we present a general framework for approximation mechanism design
without money for facility games
Above- and belowground tree biomass models for three mangrove species in Tanzania: a nonlinear mixed effects modelling approach
International audienceAbstractKey messageTested on data from Tanzania, both existing species-specific and common biomass models developed elsewhere revealed statistically significant large prediction errors. Species-specific and common above- and belowground biomass models for three mangrove species were therefore developed. The species-specific models fitted better to data than the common models. The former models are recommended for accurate estimation of biomass stored in mangrove forests of Tanzania.ContextMangroves are essential for climate change mitigation through carbon storage and sequestration. Biomass models are important tools for quantifying biomass and carbon stock. While numerous aboveground biomass models exist, very few studies have focused on belowground biomass, and among these, mangroves of Africa are hardly or not represented.AimsThe aims of the study were to develop above- and belowground biomass models and to evaluate the predictive accuracy of existing aboveground biomass models developed for mangroves in other regions and neighboring countries when applied on data from Tanzania.MethodsData was collected through destructive sampling of 120 trees (aboveground biomass), among these 30 trees were sampled for belowground biomass. The data originated from four sites along the Tanzanian coastline covering three dominant species: Avicennia marina (Forssk.) Vierh, Sonneratia alba J. Smith, and Rhizophora mucronata Lam. The biomass models were developed through mixed modelling leading to fixed effects/common models and random effects/species-specific models.ResultsBoth the above- and belowground biomass models improved when random effects (species) were considered. Inclusion of total tree height as predictor variable, in addition to diameter at breast height alone, further improved the model predictive accuracy. The tests of existing models from other regions on our data generally showed large and significant prediction errors for aboveground tree biomass.ConclusionInclusion of random effects resulted into improved goodness of fit for both above- and belowground biomass models. Species-specific models therefore are recommended for accurate biomass estimation of mangrove forests in Tanzania for both management and ecological applications. For belowground biomass (S. alba) however, the fixed effects/common model is recommended
- …