Search CORE

2,138 research outputs found

Why Does Bagging Work ? A Bayesian Account and its Implications

Author: Pedro Domingos1t2
Publication venue
Publication date: 24/04/2020
Field of study

Abstract The error rate of decision-tree and other cIassification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayesian learning theory: (1) bagging works because it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior; (2) bagging works be--^

CiteSeerX

An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

Author: Malley James
Strobl Carolin
Tutz Gerhard
Publication venue
Publication date: 01/04/2009
Field of study

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

Crossref

Open Access LMU

PubMed Central

Statistical Learning from a Regression Perspective

Author: John Maindonald
Publication venue
Publication date
Field of study

Research Papers in Economics

An experiment with association rules and classification: post-bagging and conviction

Author: A. Jorge
B. Liu
B. Liu
D. Meretakis
I. Kononenko
I.H. Witten
K. Ali
L. Breiman
M.J. Zaki
P. Domingos
R. Ihaka
R.J. Bayardo
T. Hastie
T.K. Ho
U.M. Fayyad
V. Jovanoski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

In this paper we study a new technique we call post-bagging, which consists in resampling parts of a classification model rather then the data. We do this with a particular kind of model: large sets of classification association rules, and in combination with ordinary best rule and weighted voting approaches. We empirically evaluate the effects of the technique in terms of classification accuracy. We also discuss the predictive power of different metrics used for association rule mining, such as confidence, lift, conviction and X². We conclude that, for the described experimental conditions, post-bagging improves classification results and that the best metric is conviction.Programa de Financiamento Plurianual de Unidades de I & D.Comunidade Europeia (CE). Fundo Europeu de Desenvolvimento Regional (FEDER).Fundação para a Ciência e a Tecnologia (FCT) - POSI/SRI/39630/2001/Class Project

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

Generative Adversarial Networks for Financial Trading Strategies Fine-Tuning and Combination

Author: Firoozye Nick
Koshiyama Adriano
Treleaven Philip
Publication venue
Publication date: 30/03/2019
Field of study

Systematic trading strategies are algorithmic procedures that allocate assets aiming to optimize a certain performance criterion. To obtain an edge in a highly competitive environment, the analyst needs to proper fine-tune its strategy, or discover how to combine weak signals in novel alpha creating manners. Both aspects, namely fine-tuning and combination, have been extensively researched using several methods, but emerging techniques such as Generative Adversarial Networks can have an impact into such aspects. Therefore, our work proposes the use of Conditional Generative Adversarial Networks (cGANs) for trading strategies calibration and aggregation. To this purpose, we provide a full methodology on: (i) the training and selection of a cGAN for time series data; (ii) how each sample is used for strategies calibration; and (iii) how all generated samples can be used for ensemble modelling. To provide evidence that our approach is well grounded, we have designed an experiment with multiple trading strategies, encompassing 579 assets. We compared cGAN with an ensemble scheme and model validation methods, both suited for time series. Our results suggest that cGANs are a suitable alternative for strategies calibration and combination, providing outperformance when the traditional techniques fail to generate any alpha

arXiv.org e-Print Archive

UCL Discovery

Hierarchical shrinkage priors for dynamic regressions with many predictors

Author: Korobilis Dimitris
Publication venue
Publication date
Field of study

This paper builds on a simple unified representation of shrinkage Bayes estimators based on hierarchical Normal-Gamma priors. Various popular penalized least squares estimators for shrinkage and selection in regression models can be recovered using this single hierarchical Bayes formulation. Using 129 U.S. macroeconomic quarterly variables for the period 1959 -- 2010 I exhaustively evaluate the forecasting properties of Bayesian shrinkage in regressions with many predictors. Results show that for particular data series hierarchical shrinkage dominates factor model forecasts, and hence it becomes a valuable addition to existing methods for handling large dimensional data.Forecasting; shrinkage; factor model; variable selection; Bayesian LASSO

Research Papers in Economics

An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service

Author: Cools Mario
Farooq Bilal
Saadi Ismaïl
Teller Jacques
Wong Melvin
Publication venue
Publication date: 07/03/2017
Field of study

In this paper, we present machine learning approaches for characterizing and forecasting the short-term demand for on-demand ride-hailing services. We propose the spatio-temporal estimation of the demand that is a function of variable effects related to traffic, pricing and weather conditions. With respect to the methodology, a single decision tree, bootstrap-aggregated (bagged) decision trees, random forest, boosted decision trees, and artificial neural network for regression have been adapted and systematically compared using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and slope. To better assess the quality of the models, they have been tested on a real case study using the data of DiDi Chuxing, the main on-demand ride hailing service provider in China. In the current study, 199,584 time-slots describing the spatio-temporal ride-hailing demand has been extracted with an aggregated-time interval of 10 mins. All the methods are trained and validated on the basis of two independent samples from this dataset. The results revealed that boosted decision trees provide the best prediction accuracy (RMSE=16.41), while avoiding the risk of over-fitting, followed by artificial neural network (20.09), random forest (23.50), bagged decision trees (24.29) and single decision tree (33.55).Comment: Currently under review for journal publicatio

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège