Search CORE

2,702 research outputs found

Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

Author: Buja Andreas
Mease David
Wyner Abraham J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 17/04/2008
Field of study

The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as ``the statistical view'' has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided. [arXiv:0804.2752]Comment: Published in at http://dx.doi.org/10.1214/07-STS242B the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

A Comparative Study of Machine Learning Models for Tabular Data Through Challenge of Monitoring Parkinson's Disease Progression Using Voice Recordings

Author: Arabnia Hamid Reza
Giuntini Amy
Iman Mohammadreza
Rasheed Khaled
Publication venue
Publication date: 27/05/2020
Field of study

People with Parkinson's disease must be regularly monitored by their physician to observe how the disease is progressing and potentially adjust treatment plans to mitigate the symptoms. Monitoring the progression of the disease through a voice recording captured by the patient at their own home can make the process faster and less stressful. Using a dataset of voice recordings of 42 people with early-stage Parkinson's disease over a time span of 6 months, we applied multiple machine learning techniques to find a correlation between the voice recording and the patient's motor UPDRS score. We approached this problem using a multitude of both regression and classification techniques. Much of this paper is dedicated to mapping the voice data to motor UPDRS scores using regression techniques in order to obtain a more precise value for unknown instances. Through this comparative study of variant machine learning methods, we realized some old machine learning methods like trees outperform cutting edge deep learning models on numerous tabular datasets.Comment: Accepted at "HIMS'20 - The 6th Int'l Conf on Health Informatics and Medical Systems"; https://americancse.org/events/csce2020/conferences/hims2

arXiv.org e-Print Archive

Evidential Bagging: Combining Heterogeneous Classifiers in the Belief Functions Framework

Author: AP Dempster
B Efron
C Cortes
D Dubois
DH Wolpert
DH Wolpert
G Qu
G Shafer
Jérémie François
L Breiman
P Smets
P Smets
P Vannoorenberghe
P Xu
R Polikar
RE Schapire
RR Yager
S Džeroski
T Denoeux
T Denœux
Y Freund
ZH Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/06/2018
Field of study

International audienceIn machine learning, Ensemble Learning methodologies are known to improve predictive accuracy and robustness. They consist in the learning of many classifiers that produce outputs which are finally combined according to different techniques. Bagging, or Bootstrap Aggre-gating, is one of the most famous Ensemble methodologies and is usually applied to the same classification base algorithm, i.e. the same type of classifier is learnt multiple times on bootstrapped versions of the initial learning dataset. In this paper, we propose a bagging methodology that involves different types of classifier. Classifiers' probabilist outputs are used to build mass functions which are further combined within the belief functions framework. Three different ways of building mass functions are proposed; preliminary experiments on benchmark datasets showing the relevancy of the approach are presented

Crossref

HAL Descartes

Hal-Diderot

Isoelastic Agents and Wealth Updates in Machine Learning Markets

Author: Geras Krzysztof
Millin Jono
Storkey Amos
Publication venue
Publication date: 01/01/2012
Field of study

Recently, prediction markets have shown considerable promise for developing flexible mechanisms for machine learning. In this paper, agents with isoelastic utilities are considered. It is shown that the costs associated with homogeneous markets of agents with isoelastic utilities produce equilibrium prices corresponding to alpha-mixtures, with a particular form of mixing component relating to each agent's wealth. We also demonstrate that wealth accumulation for logarithmic and other isoelastic agents (through payoffs on prediction of training targets) can implement both Bayesian model updates and mixture weight updates by imposing different market payoff structures. An iterative algorithm is given for market equilibrium computation. We demonstrate that inhomogeneous markets of agents with isoelastic utilities outperform state of the art aggregate classifiers such as random forests, as well as single classifiers (neural networks, decision trees) on a number of machine learning benchmarks, and show that isoelastic combination methods are generally better than their logarithmic counterparts.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

Edinburgh Research Explorer

Big data techniques in auditing research and practice: Current trends and future opportunities

Author: Gepp Adrian
Linnenluecke Martina K
O'Neill Terence J
Publication venue: 'Elsevier BV'
Publication date: 01/06/2018
Field of study

Bond University Research Portal

Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores

Author: Colpaert Kirsten
Couckuyt Ivo
De Turck Filip
Decruyenaere Johan
Dhaene Tom
Gadeyne Bram
Houthooft Rein
Ongenae Femke
Ruyssinck Joeri
Stijven Sean
van der Herten Joachim
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Collaborative Training in Sensor Networks: A graphical model approach

Author: Kulkarni Sanjeev R.
Poor H. Vincent
Zheng Haipeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/07/2009
Field of study

Graphical models have been widely applied in solving distributed inference problems in sensor networks. In this paper, the problem of coordinating a network of sensors to train a unique ensemble estimator under communication constraints is discussed. The information structure of graphical models with specific potential functions is employed, and this thus converts the collaborative training task into a problem of local training plus global inference. Two important classes of algorithms of graphical model inference, message-passing algorithm and sampling algorithm, are employed to tackle low-dimensional, parametrized and high-dimensional, non-parametrized problems respectively. The efficacy of this approach is demonstrated by concrete examples

arXiv.org e-Print Archive

Crossref