199,709 research outputs found

    Risk bounds for purely uniformly random forests

    Get PDF
    Random forests, introduced by Leo Breiman in 2001, are a very effective statistical method. The complex mechanism of the method makes theoretical analysis difficult. Therefore, a simplified version of random forests, called purely random forests, which can be theoretically handled more easily, has been considered. In this paper we introduce a variant of this kind of random forests, that we call purely uniformly random forests. In the context of regression problems with a one-dimensional predictor space, we show that both random trees and random forests reach minimax rate of convergence. In addition, we prove that compared to random trees, random forests improve accuracy by reducing the estimator variance by a factor of three fourths

    Critical random forests

    Full text link
    Let F(N,m)F(N,m) denote a random forest on a set of NN vertices, chosen uniformly from all forests with mm edges. Let F(N,p)F(N,p) denote the forest obtained by conditioning the Erdos-Renyi graph G(N,p)G(N,p) to be acyclic. We describe scaling limits for the largest components of F(N,p)F(N,p) and F(N,m)F(N,m), in the critical window p=N1+O(N4/3)p=N^{-1}+O(N^{-4/3}) or m=N/2+O(N2/3)m=N/2+O(N^{2/3}). Aldous described a scaling limit for the largest components of G(N,p)G(N,p) within the critical window in terms of the excursion lengths of a reflected Brownian motion with time-dependent drift. Our scaling limit for critical random forests is of a similar nature, but now based on a reflected diffusion whose drift depends on space as well as on time

    Consistency of random forests

    Get PDF
    Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's [Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity. 1. Introduction. Random forests are an ensemble learning method for classification and regression that constructs a number of randomized decision trees during the training phase and predicts by averaging the results. Since its publication in the seminal paper of Breiman (2001), the procedure has become a major data analysis tool, that performs well in practice in comparison with many standard methods. What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune. Aside from being simple to use, the method is generally recognized for its accuracy and its ability to deal with small sample sizes, high-dimensional feature spaces and complex data structures. The random forest methodology has been successfully involved in many practical problems, including air quality prediction (winning code of the EMC data science global hackathon in 2012, see http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al. (2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3

    Random Prism: An Alternative to Random Forests.

    Get PDF
    Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting
    corecore