2,437 research outputs found

    Random model trees: an effective and scalable regression method

    Get PDF
    We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivaling the state of the art in numeric prediction. An extensive empirical investigation shows that Random Model Trees produce predictive performance which is competitive with state-of-the-art methods like Gaussian Processes Regression or Additive Groves of Regression Trees. The training and optimization of Random Model Trees scales better than Gaussian Processes Regression to larger datasets, and enjoys a constant advantage over Additive Groves of the order of one to two orders of magnitude

    Learning to Rank Academic Experts in the DBLP Dataset

    Full text link
    Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts, and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combing all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with arXiv:1302.041

    Particle Gibbs for Bayesian Additive Regression Trees

    Full text link
    Additive regression trees are flexible non-parametric models and popular off-the-shelf tools for real-world non-linear regression. In application domains, such as bioinformatics, where there is also demand for probabilistic predictions with measures of uncertainty, the Bayesian additive regression trees (BART) model, introduced by Chipman et al. (2010), is increasingly popular. As data sets have grown in size, however, the standard Metropolis-Hastings algorithms used to perform inference in BART are proving inadequate. In particular, these Markov chains make local changes to the trees and suffer from slow mixing when the data are high-dimensional or the best fitting trees are more than a few layers deep. We present a novel sampler for BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a top-down particle filtering algorithm for Bayesian decision trees (Lakshminarayanan et al., 2013). Rather than making local changes to individual trees, the PG sampler proposes a complete tree to fit the residual. Experiments show that the PG sampler outperforms existing samplers in many settings

    Sample Efficient Policy Search for Optimal Stopping Domains

    Full text link
    Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields.Comment: To appear in IJCAI-201

    Regression Tree Predictive Filter

    Get PDF
    Many algorithms have been developed to predict future samples of a signal. These algorithms, such as the recursive least squares predictive filter, rely on the assumption that the system generating the signal can be modeled as a linear system of equations. These systems perform poorly when used to predict signals generated by non-linear systems. To predict a non-linear signal, non-linear methods must be used. Regression trees are a simple form of machine learning that is non-linear in nature and can predict output based on a set of given input. The goal of this capstone project was to develop an algorithm for a regression trees predictive filter capable of predicting a non-linear signa. As this capstone was also an engineering design project it was also the goal to have the algorithm be a part of software system capable of allowing the parameters of the algorithm to be changed for testing. This paper details how the algorithm was developed as well as its results. It was found that using certain non-linear input signals that the regression trees predictive filter performed better at predicting than a traditional linear predictive filter. It was also shown that the regression trees predictive filter was able to adapt to a non-linear signal generated by a changing system. In testing on the changing non-linear signal, the filter was compared to a system which reset its prediction model rather than adapt it like the regression trees predictive filter. The regression trees predictive filter had better performance than this resetting system. This shows that the regression trees predictive filter can adapt to a system in such a way that it learned from it

    A Survey on Approximation Mechanism Design without Money for Facility Games

    Full text link
    In a facility game one or more facilities are placed in a metric space to serve a set of selfish agents whose addresses are their private information. In a classical facility game, each agent wants to be as close to a facility as possible, and the cost of an agent can be defined as the distance between her location and the closest facility. In an obnoxious facility game, each agent wants to be far away from all facilities, and her utility is the distance from her location to the facility set. The objective of each agent is to minimize her cost or maximize her utility. An agent may lie if, by doing so, more benefit can be obtained. We are interested in social choice mechanisms that do not utilize payments. The game designer aims at a mechanism that is strategy-proof, in the sense that any agent cannot benefit by misreporting her address, or, even better, group strategy-proof, in the sense that any coalition of agents cannot all benefit by lying. Meanwhile, it is desirable to have the mechanism to be approximately optimal with respect to a chosen objective function. Several models for such approximation mechanism design without money for facility games have been proposed. In this paper we briefly review these models and related results for both deterministic and randomized mechanisms, and meanwhile we present a general framework for approximation mechanism design without money for facility games

    Above- and belowground tree biomass models for three mangrove species in Tanzania: a nonlinear mixed effects modelling approach

    Get PDF
    International audienceAbstractKey messageTested on data from Tanzania, both existing species-specific and common biomass models developed elsewhere revealed statistically significant large prediction errors. Species-specific and common above- and belowground biomass models for three mangrove species were therefore developed. The species-specific models fitted better to data than the common models. The former models are recommended for accurate estimation of biomass stored in mangrove forests of Tanzania.ContextMangroves are essential for climate change mitigation through carbon storage and sequestration. Biomass models are important tools for quantifying biomass and carbon stock. While numerous aboveground biomass models exist, very few studies have focused on belowground biomass, and among these, mangroves of Africa are hardly or not represented.AimsThe aims of the study were to develop above- and belowground biomass models and to evaluate the predictive accuracy of existing aboveground biomass models developed for mangroves in other regions and neighboring countries when applied on data from Tanzania.MethodsData was collected through destructive sampling of 120 trees (aboveground biomass), among these 30 trees were sampled for belowground biomass. The data originated from four sites along the Tanzanian coastline covering three dominant species: Avicennia marina (Forssk.) Vierh, Sonneratia alba J. Smith, and Rhizophora mucronata Lam. The biomass models were developed through mixed modelling leading to fixed effects/common models and random effects/species-specific models.ResultsBoth the above- and belowground biomass models improved when random effects (species) were considered. Inclusion of total tree height as predictor variable, in addition to diameter at breast height alone, further improved the model predictive accuracy. The tests of existing models from other regions on our data generally showed large and significant prediction errors for aboveground tree biomass.ConclusionInclusion of random effects resulted into improved goodness of fit for both above- and belowground biomass models. Species-specific models therefore are recommended for accurate biomass estimation of mangrove forests in Tanzania for both management and ecological applications. For belowground biomass (S. alba) however, the fixed effects/common model is recommended
    corecore