Search CORE

3,312 research outputs found

Narrowing the Gap: Random Forests In Theory and In Practice

Author: de Freitas Nando
Denil Misha
Matheson David
Publication venue
Publication date: 04/10/2013
Field of study

Despite widespread interest and practical use, the theoretical properties of random forests are still not well understood. In this paper we contribute to this understanding in two ways. We present a new theoretically tractable variant of random regression forests and prove that our algorithm is consistent. We also provide an empirical evaluation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in practice. Our experiments provide insight into the relative importance of different simplifications that theoreticians have made to obtain tractable models for analysis.Comment: Under review by the International Conference on Machine Learning (ICML) 201

arXiv.org e-Print Archive

Oxford University Research Archive

Fitting Prediction Rule Ensembles with R Package pre

Author: Fokkema Marjolein
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/02/2020
Field of study

Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

arXiv.org e-Print Archive

Journal of Statistical Software

Leiden University Scholary Publications

New statistical method identifes cytokines that distinguish stool microbiomes

Author: Deych Elena
Hanson Blake
Johnson Jethro
Shands Berkley
Shannon William D.
Sodergren Erica
Weinstock George
Yang Dake
Zhou Xin
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Regressing an outcome or dependent variable onto a set of input or independent variables allows the analyst to measure associations between the two so that changes in the outcome can be described by and predicted by changes in the inputs. While there are many ways of doing this in classical statistics, where the dependent variable has certain properties (e.g., a scalar, survival time, count), little progress on regression where the dependent variable are microbiome taxa counts has been made that do not impose extremely strict conditions on the data. In this paper, we propose and apply a new regression model combining the Dirichlet-multinomial distribution with recursive partitioning providing a fully non-parametric regression model. This model, called DM-RPart, is applied to cytokine data and microbiome taxa count data and is applicable to any microbiome taxa count/metadata, is automatically fit, and intuitively interpretable. This is a model which can be applied to any microbiome or other compositional data and software (R package HMP) available through the R CRAN website

The Jackson Laboratory: The Mouseion at the JAXlibrary

Digital Commons@Becker

A Random Forests Approach to Assess Determinants of Central Bank Independence

Author: Cavicchioli Maddalena
Papana Dagiasis Ariadni
Papana Angeliki
Pistoresi Barbara
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2018
Field of study

A non-parametric efficient statistical method, Random Forests, is implemented for the selection of the determinants of Central Bank Independence (CBI) among a large database of economic, political, and institutional variables for OECD countries. It permits ranking all the determinants based on their importance in respect to the CBI and does not impose a priori assumptions on potential nonlinear relationships in the data. Collinearity issues are resolved, because correlated variables can be simultaneously considered

Digital Commons@Wayne State University

Open Access Repository

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Decision Stream: Cultivating Deep Decision Trees

Author: Ignatov Andrey
Ignatov Dmitry
Publication venue
Publication date: 03/09/2017
Field of study

Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%

arXiv.org e-Print Archive

Crossref