Search CORE

16,265 research outputs found

An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

Author: Malley James
Strobl Carolin
Tutz Gerhard
Publication venue
Publication date: 01/04/2009
Field of study

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

Crossref

Open Access LMU

PubMed Central

Party on!

Author: Hothorn Torsten
Strobl Carolin
Zeileis Achim
Publication venue
Publication date: 11/02/2009
Field of study

Open Access LMU

Percolation-like Scaling Exponents for Minimal Paths and Trees in the Stochastic Mean Field Model

Author: Aldous D. J.
Balas E.
Publication venue: 'The Royal Society'
Publication date: 01/01/2005
Field of study

In the mean field (or random link) model there are

n

points and inter-point distances are independent random variables. For

0 < \ell < \infty

and in the

n \to \infty

limit, let

\delta(\ell) = 1/n \times

(maximum number of steps in a path whose average step-length is

\leq \ell

). The function

\delta(\ell)

is analogous to the percolation function in percolation theory: there is a critical value

\ell_* = e^{-1}

at which

\delta(\cdot)

becomes non-zero, and (presumably) a scaling exponent

\beta

in the sense

\delta(\ell) \asymp (\ell - \ell_*)^\beta

. Recently developed probabilistic methodology (in some sense a rephrasing of the cavity method of Mezard-Parisi) provides a simple albeit non-rigorous way of writing down such functions in terms of solutions of fixed-point equations for probability distributions. Solving numerically gives convincing evidence that

\beta = 3

. A parallel study with trees instead of paths gives scaling exponent

\beta = 2

. The new exponents coincide with those found in a different context (comparing optimal and near-optimal solutions of mean-field TSP and MST) and reinforce the suggestion that these scaling exponents determine universality classes for optimization problems on random points.Comment: 19 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California