Search CORE

50,946 research outputs found

Critical random forests

Author: Martin James
Yeo Dominic
Publication venue
Publication date: 01/01/2018
Field of study

Let

F(N,m)

denote a random forest on a set of

N

vertices, chosen uniformly from all forests with

m

edges. Let

F(N,p)

denote the forest obtained by conditioning the Erdos-Renyi graph

G(N,p)

to be acyclic. We describe scaling limits for the largest components of

F(N,p)

and

F(N,m)

, in the critical window

p=N^{-1}+O(N^{-4/3})

m=N/2+O(N^{2/3})

. Aldous described a scaling limit for the largest components of

G(N,p)

within the critical window in terms of the excursion lengths of a reflected Brownian motion with time-dependent drift. Our scaling limit for critical random forests is of a similar nature, but now based on a reflected diffusion whose drift depends on space as well as on time

arXiv.org e-Print Archive

Oxford University Research Archive

Risk bounds for purely uniformly random forests

Author: Genuer Robin
Publication venue
Publication date: 15/06/2010
Field of study

Random forests, introduced by Leo Breiman in 2001, are a very effective statistical method. The complex mechanism of the method makes theoretical analysis difficult. Therefore, a simplified version of random forests, called purely random forests, which can be theoretically handled more easily, has been considered. In this paper we introduce a variant of this kind of random forests, that we call purely uniformly random forests. In the context of regression problems with a one-dimensional predictor space, we show that both random trees and random forests reach minimax rate of convergence. In addition, we prove that compared to random trees, random forests improve accuracy by reducing the estimator variance by a factor of three fourths

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Consistency of random forests

Author: Biau Gérard
Scornet Erwan
Vert Jean-Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/08/2015
Field of study

Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's [Mach. Learn. 45 (2001) 5--32] original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity. 1. Introduction. Random forests are an ensemble learning method for classification and regression that constructs a number of randomized decision trees during the training phase and predicts by averaging the results. Since its publication in the seminal paper of Breiman (2001), the procedure has become a major data analysis tool, that performs well in practice in comparison with many standard methods. What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune. Aside from being simple to use, the method is generally recognized for its accuracy and its ability to deal with small sample sizes, high-dimensional feature spaces and complex data structures. The random forest methodology has been successfully involved in many practical problems, including air quality prediction (winning code of the EMC data science global hackathon in 2012, see http://www.kaggle.com/c/dsg-hackathon), chemoinformatics [Svetnik et al. (2003)], ecology [Prasad, Iverson and Liaw (2006), Cutler et al. (2007)], 3

arXiv.org e-Print Archive

HAL-MINES ParisTech

Hal-Diderot

Random Forests and Networks Analysis

Author: Avena L.
Castell F.
Gaudilliere A.
Melot C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/11/2017
Field of study

D. Wilson~\cite{[Wi]} in the 1990's described a simple and efficient algorithm based on loop-erased random walks to sample uniform spanning trees and more generally weighted trees or forests spanning a given graph. This algorithm provides a powerful tool in analyzing structures on networks and along this line of thinking, in recent works~\cite{AG1,AG2,ACGM1,ACGM2} we focused on applications of spanning rooted forests on finite graphs. The resulting main conclusions are reviewed in this paper by collecting related theorems, algorithms, heuristics and numerical experiments. A first foundational part on determinantal structures and efficient sampling procedures is followed by four main applications: 1) a random-walk-based notion of well-distributed points in a graph 2) how to describe metastable dynamics in finite settings by means of Markov intertwining dualities 3) coarse graining schemes for networks and associated processes 4) wavelets-like pyramidal algorithms for graph signals.Comment: Survey pape

arXiv.org e-Print Archive

HAL AMU