Search CORE

200,291 research outputs found

Analysis of purely random forests bias

Author: Arlot Sylvain
Genuer Robin
Publication venue
Publication date: 01/01/2014
Field of study

Random forests are a very effective and commonly used statistical method, but their full theoretical analysis is still an open problem. As a first step, simplified models such as purely random forests have been introduced, in order to shed light on the good performance of random forests. In this paper, we study the approximation error (the bias) of some purely random forest models in a regression framework, focusing in particular on the influence of the number of trees in the forest. Under some regularity assumptions on the regression function, we show that the bias of an infinite forest decreases at a faster rate (with respect to the size of each tree) than a single tree. As a consequence, infinite forests attain a strictly better risk rate (with respect to the sample size) than single trees. Furthermore, our results allow to derive a minimum number of trees sufficient to reach the same rate as an infinite forest. As a by-product of our analysis, we also show a link between the bias of purely random forests and the bias of some kernel estimators

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Inserm

Enhancing random forests performance in microarray data classification

Author: DESSI NICOLETTA
MILIA GABRIELE
PES BARBARA
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Random forests are receiving increasing attention for classification of microarray datasets. We evaluate the effects of a feature selection process on the performance of a random forest classifier as well as on the choice of two critical parameters, i.e. the forest size and the number of features chosen at each split in growing trees. Results of our experiments suggest that parameters lower than popular default values can lead to effective and more parsimonious classification models. Growing few trees on small subsets of selected features, while randomly choosing a single variable at each split, results in classification performance that compares well with state-of-art studies

Archivio istituzionale della ricerca - Università di Cagliari

Improved Weighted Random Forest for Classification Problems

Author: A Booth
A Cielen
DH Wolpert
G Brown
G James
H Byeon
H Kim
H Pham
HK Hong
IC Yeh
JP Donate
L Breiman
L Breiman
LI Kuncheva
LI Kuncheva
LV Utkin
M Sunil Babu
MK Yöntem
N Hooda
P Peykani
P Peykani
P Peykani
P Peykani
R Alizadehsani
RJ Lyon
S Moro
SJ Winham
Z Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting a random number of features as well. This has made the random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on ac-curacy, optimal weighted random forest based on the area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

Comparison of the Representational Power of Random Forests, Binary Decision Diagrams, and Neural Networks

Author: Akutsu Tatsuya
Kumano So
Publication venue: 'MIT Press - Journals'
Publication date: 01/04/2022
Field of study

In this letter, we compare the representational power of random forests, binary decision diagrams (BDDs), and neural networks in terms of the number of nodes. We assume that an axis-aligned function on a single variable is assigned to each edge in random forests and BDDs, and the activation functions of neural networks are sigmoid, rectified linear unit, or similar functions. Based on existing studies, we show that for any random forest, there exists an equivalent depth-3 neural network with a linear number of nodes. We also show that for any BDD with balanced width, there exists an equivalent shallow depth neural network with a polynomial number of nodes. These results suggest that even shallow neural networks have the same or higher representation power than deep random forests and deep BDDs. We also show that in some cases, an exponential number of nodes are required to express a given random forest by a random forest with a much fewer number of trees, which suggests that many trees are required for random forests to represent some specific knowledge efficiently

Kyoto University Research Information Repository

Fitting Prediction Rule Ensembles with R Package pre

Author: Fokkema Marjolein
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/02/2020
Field of study

Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

arXiv.org e-Print Archive

Journal of Statistical Software

Leiden University Scholary Publications