1,044 research outputs found

    Consistency of random forests

    Get PDF
    Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's [Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity. 1. Introduction. Random forests are an ensemble learning method for classification and regression that constructs a number of randomized decision trees during the training phase and predicts by averaging the results. Since its publication in the seminal paper of Breiman (2001), the procedure has become a major data analysis tool, that performs well in practice in comparison with many standard methods. What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune. Aside from being simple to use, the method is generally recognized for its accuracy and its ability to deal with small sample sizes, high-dimensional feature spaces and complex data structures. The random forest methodology has been successfully involved in many practical problems, including air quality prediction (winning code of the EMC data science global hackathon in 2012, see http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al. (2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3

    Computational convergence of the path integral for real dendritic morphologies

    Get PDF
    Neurons are characterised by a morphological structure unique amongst biological cells, the core of which is the dendritic tree. The vast number of dendritic geometries, combined with heterogeneous properties of the cell membrane, continue to challenge scientists in predicting neuronal input-output relationships, even in the case of sub-threshold dendritic currents. The Green’s function obtained for a given dendritic geometry provides this functional relationship for passive or quasi-active dendrites and can be constructed by a sum-over-trips approach based on a path integral formalism. In this paper, we introduce a number of efficient algorithms for realisation of the sum-over-trips framework and investigate the convergence of these algorithms on different dendritic geometries. We demonstrate that the convergence of the trip sampling methods strongly depends on dendritic morphology as well as the biophysical properties of the cell membrane. For real morphologies, the number of trips to guarantee a small convergence error might become very large and strongly affect computational efficiency. As an alternative, we introduce a highly-efficient matrix method which can be applied to arbitrary branching structures

    Conformal Prediction: a Unified Review of Theory and New Challenges

    Full text link
    In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the theoretical underpinnings of Conformal Prediction, and then proceeds to list the more advanced developments and adaptations of the original idea.Comment: arXiv admin note: text overlap with arXiv:0706.3188, arXiv:1604.04173, arXiv:1709.06233, arXiv:1203.5422 by other author
    corecore