Search CORE

4,002 research outputs found

ICA as a preprocessing technique for classification

Author: F. Breiman
Hyvärinen
L. Breiman
M. Mitchell
R.O. Duda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

In this paper we propose the use of the independent component analysis (ICA) [1] technique for improving the classification rate of decision trees and multilayer perceptrons [2], [3]. The use of an ICA for the preprocessing stage, makes the structure of both classifiers simpler, and therefore improves the generalization properties. The hypothesis behind the proposed preprocessing is that an ICA analysis will transform the feature space into a space where the components are independent, and aligned to the axes and therefore will be more adapted to the way that a decision tree is constructed. Also the inference of the weights of a multilayer perceptron will be much easier because the gradient search in the weight space will follow independent trajectories. The result is that classifiers are less complex and on some databases the error rate is lower. This idea is also applicable to regressio

Crossref

RIUVic

Machine learning with the hierarchy‐of‐hypotheses (HoH) approach discovers novel pattern in studies on biological invasions

Author: Barnett‐page E
Breiman L
Cooper H
Fidler F
Hackett EJ
Publication venue
Publication date: 01/01/2019
Field of study

Research synthesis on simple yet general hypotheses and ideas is challenging in scientific disciplines studying highly context‐dependent systems such as medical, social, and biological sciences. This study shows that machine learning, equation‐free statistical modeling of artificial intelligence, is a promising synthesis tool for discovering novel patterns and the source of controversy in a general hypothesis. We apply a decision tree algorithm, assuming that evidence from various contexts can be adequately integrated in a hierarchically nested structure. As a case study, we analyzed 163 articles that studied a prominent hypothesis in invasion biology, the enemy release hypothesis. We explored if any of the nine attributes that classify each study can differentiate conclusions as classification problem. Results corroborated that machine learning can be useful for research synthesis, as the algorithm could detect patterns that had been already focused in previous narrative reviews. Compared with the previous synthesis study that assessed the same evidence collection based on experts' judgement, the algorithm has newly proposed that the studies focusing on Asian regions mostly supported the hypothesis, suggesting that more detailed investigations in these regions can enhance our understanding of the hypothesis. We suggest that machine learning algorithms can be a promising synthesis tool especially where studies (a) reformulate a general hypothesis from different perspectives, (b) use different methods or variables, or (c) report insufficient information for conducting meta‐analyses

Institutional Repository of the Freie Universität Berlin

Crossref

TreeGrad: Transferring Tree Ensembles to Neural Networks

Author: C Siu
DH Wolpert
F Pedregosa
JA Blackard
JH Friedman
K Nakai
L Breiman
SK Murthy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/12/2019
Field of study

Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

arXiv.org e-Print Archive

Crossref

Mixing hetero- and homogeneous models in weighted ensembles

Author: A Benavoli
F Provost
J Demšar
James Large
L Breiman
L Breiman
L Kuncheva
S García
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The effectiveness of ensembling for improving classification performance is well documented. Broadly speaking, ensemble design can be expressed as a spectrum where at one end a set of heterogeneous classifiers model the same data, and at the other homogeneous models derived from the same classification algorithm are diversified through data manipulation. The cross-validation accuracy weighted probabilistic ensemble is a heterogeneous weighted ensemble scheme that needs reliable estimates of error from its base classifiers. It estimates error through a cross-validation process, and raises the estimates to a power to accentuate differences. We study the effects of maintaining all models trained during cross-validation on the final ensemble's predictive performance, and the base model's and resulting ensembles' variance and robustness across datasets and resamples. We find that augmenting the ensemble through the retention of all models trained provides a consistent and significant improvement, despite reductions in the reliability of the base models' performance estimates

Crossref

University of East Anglia digital repository

Child Mortality in Mozambique: a Review of Recent Trends and Attributable Causes

Author: Bassat Orellana Quique
Breiman Robert F.
Sitoe Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/06/2018
Field of study

Data regarding the main causes of death among children in Mozambique are patchy, outdated, and in many cases based on methodologies with many underlying limitations, which make them unreliable. More robust postmortem methodologies to study the underlying causes of mortality, currently being introduced in a surveillance sentinel site of the country, will surely contribute to improve our understanding of what is really killing children in this country

Crossref

Diposit Digital de la Universitat de Barcelona

Improving adaptive bagging methods for evolving data streams

Author: A. Bifet
F. Chu
F. Gustafsson
G. Hulten
J. Campo-Ávila del
J. Gama
L. Breiman
M. Basseville
N. Oza
P. Zhang
T. Mitchell
W.N. Street
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of different sizes, and ADWIN Bagging uses ADWIN as a change detector to decide when to discard underperforming ensemble members. We improve ADWIN Bagging using Hoeffding Adaptive Trees, trees that can adaptively learn from data streams that change over time. To speed up the time for adapting to change of Adaptive-Size Hoeffding Tree (ASHT) Bagging, we add an error change detector for each classifier. We test our improvements by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples

Crossref

Research Commons@Waikato

Dengue Fever Outbreak in a Recreation Club, Dhaka, Bangladesh

Author: Breiman Robert F.
Hossain Anowar
Rahman Mahbubur
Wagatsuma Yukiko
Publication venue: Centers for Disease Control and Prevention
Publication date: 01/04/2004
Field of study

An outbreak of dengue fever occurred among employees of a recreation club in Bangladesh. Occupational transmission was characterized by a 12% attack rate, no dengue among family contacts, and Aedes vectors in club areas. Early recognition of the outbreak likely limited its impact

Directory of Open Access Journals

PubMed Central

Random forests with random projections of the output space for high dimensional multi-label classification

Author: D. Achlioptas
D. Kocev
E.J. Candes
F. Pedregosa
G. Madjarov
G. Tsoumakas
G. Tsoumakas
J. Read
J.L. Faulon
L. Breiman
P. Geurts
W.B. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

Chains of infinite order, chains with memory of variable length, and maps of the interval

Author: Antonio Galves
C. Liverani
D. Ruelle
F. Comets
F. Hofbauer
F. Ledrappier
J. Buzzi
J. Rissanen
J.G. Sinaĭ
L. Breiman
O. Onicescu
P. Cénac
P. Walters
P. Walters
Pierre Collet
R. Bowen
R. Fernández
T.E. Harris
W. Doeblin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We show how to construct a topological Markov map of the interval whose invariant probability measure is the stationary law of a given stochastic chain of infinite order. In particular we caracterize the maps corresponding to stochastic chains with memory of variable length. The problem treated here is the converse of the classical construction of the Gibbs formalism for Markov expanding maps of the interval

arXiv.org e-Print Archive

Crossref

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Computer aided diagnosis for cardiovascular diseases based on ECG signals : a survey

Author: Breiman L
E. Y. K. NG
Oliver F
OLIVER FAUST
Zweig MH
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 23/02/2016
Field of study

The interpretation of Electroencephalography (ECG) signals is difficult, because even subtle changes in the waveform can indicate a serious heart disease. Furthermore, these waveform changes might not be present all the time. As a consequence, it takes years of training for a medical practitioner to become an expert in ECG-based cardiovascular disease diagnosis. That training is a major investment in a specific skill. Even with expert ability, the signal interpretation takes time. In addition, human interpretation of ECG signals causes interoperator and intraoperator variability. ECG-based Computer-Aided Diagnosis (CAD) holds the promise of improving the diagnosis accuracy and reducing the cost. The same ECG signal will result in the same diagnosis support regardless of time and place. This paper introduces both the techniques used to realize the CAD functionality and the methods used to assess the established functionality. This survey aims to instill trust in CAD of cardiovascular diseases using ECG signals by introducing both a conceptional overview of the system and the necessary assessment method

Crossref

Sheffield Hallam University Research Archive