Search CORE

199,709 research outputs found

Risk bounds for purely uniformly random forests

Author: Genuer Robin
Publication venue
Publication date: 15/06/2010
Field of study

Random forests, introduced by Leo Breiman in 2001, are a very effective statistical method. The complex mechanism of the method makes theoretical analysis difficult. Therefore, a simplified version of random forests, called purely random forests, which can be theoretically handled more easily, has been considered. In this paper we introduce a variant of this kind of random forests, that we call purely uniformly random forests. In the context of regression problems with a one-dimensional predictor space, we show that both random trees and random forests reach minimax rate of convergence. In addition, we prove that compared to random trees, random forests improve accuracy by reducing the estimator variance by a factor of three fourths

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Critical random forests

Author: Martin James
Yeo Dominic
Publication venue
Publication date: 01/01/2018
Field of study

Let

F(N,m)

denote a random forest on a set of

N

vertices, chosen uniformly from all forests with

m

edges. Let

F(N,p)

denote the forest obtained by conditioning the Erdos-Renyi graph

G(N,p)

to be acyclic. We describe scaling limits for the largest components of

F(N,p)

and

F(N,m)

, in the critical window

p=N^{-1}+O(N^{-4/3})

m=N/2+O(N^{2/3})

. Aldous described a scaling limit for the largest components of

G(N,p)

within the critical window in terms of the excursion lengths of a reflected Brownian motion with time-dependent drift. Our scaling limit for critical random forests is of a similar nature, but now based on a reflected diffusion whose drift depends on space as well as on time

arXiv.org e-Print Archive

Oxford University Research Archive

Consistency of random forests

Author: Biau Gérard
Scornet Erwan
Vert Jean-Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/08/2015
Field of study

Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's [Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity. 1. Introduction. Random forests are an ensemble learning method for classification and regression that constructs a number of randomized decision trees during the training phase and predicts by averaging the results. Since its publication in the seminal paper of Breiman (2001), the procedure has become a major data analysis tool, that performs well in practice in comparison with many standard methods. What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune. Aside from being simple to use, the method is generally recognized for its accuracy and its ability to deal with small sample sizes, high-dimensional feature spaces and complex data structures. The random forest methodology has been successfully involved in many practical problems, including air quality prediction (winning code of the EMC data science global hackathon in 2012, see http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al. (2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3

arXiv.org e-Print Archive

HAL-MINES ParisTech

Hal-Diderot

Random Prism: An Alternative to Random Forests.

Author: Bramer Max
Stahl Frederic
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting

Central Archive at the University of Reading

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online