Search CORE

3,388 research outputs found

ICA as a preprocessing technique for classification

Author: F. Breiman
Hyvärinen
L. Breiman
M. Mitchell
R.O. Duda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

In this paper we propose the use of the independent component analysis (ICA) [1] technique for improving the classification rate of decision trees and multilayer perceptrons [2], [3]. The use of an ICA for the preprocessing stage, makes the structure of both classifiers simpler, and therefore improves the generalization properties. The hypothesis behind the proposed preprocessing is that an ICA analysis will transform the feature space into a space where the components are independent, and aligned to the axes and therefore will be more adapted to the way that a decision tree is constructed. Also the inference of the weights of a multilayer perceptron will be much easier because the gradient search in the weight space will follow independent trajectories. The result is that classifiers are less complex and on some databases the error rate is lower. This idea is also applicable to regressio

Crossref

RIUVic

Machine learning with the hierarchy‐of‐hypotheses (HoH) approach discovers novel pattern in studies on biological invasions

Author: Barnett‐page E
Breiman L
Cooper H
Fidler F
Hackett EJ
Publication venue
Publication date: 01/01/2019
Field of study

Research synthesis on simple yet general hypotheses and ideas is challenging in scientific disciplines studying highly context‐dependent systems such as medical, social, and biological sciences. This study shows that machine learning, equation‐free statistical modeling of artificial intelligence, is a promising synthesis tool for discovering novel patterns and the source of controversy in a general hypothesis. We apply a decision tree algorithm, assuming that evidence from various contexts can be adequately integrated in a hierarchically nested structure. As a case study, we analyzed 163 articles that studied a prominent hypothesis in invasion biology, the enemy release hypothesis. We explored if any of the nine attributes that classify each study can differentiate conclusions as classification problem. Results corroborated that machine learning can be useful for research synthesis, as the algorithm could detect patterns that had been already focused in previous narrative reviews. Compared with the previous synthesis study that assessed the same evidence collection based on experts' judgement, the algorithm has newly proposed that the studies focusing on Asian regions mostly supported the hypothesis, suggesting that more detailed investigations in these regions can enhance our understanding of the hypothesis. We suggest that machine learning algorithms can be a promising synthesis tool especially where studies (a) reformulate a general hypothesis from different perspectives, (b) use different methods or variables, or (c) report insufficient information for conducting meta‐analyses

Institutional Repository of the Freie Universität Berlin

Crossref

Mixing hetero- and homogeneous models in weighted ensembles

Author: A Benavoli
F Provost
J Demšar
James Large
L Breiman
L Breiman
L Kuncheva
S García
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The effectiveness of ensembling for improving classification performance is well documented. Broadly speaking, ensemble design can be expressed as a spectrum where at one end a set of heterogeneous classifiers model the same data, and at the other homogeneous models derived from the same classification algorithm are diversified through data manipulation. The cross-validation accuracy weighted probabilistic ensemble is a heterogeneous weighted ensemble scheme that needs reliable estimates of error from its base classifiers. It estimates error through a cross-validation process, and raises the estimates to a power to accentuate differences. We study the effects of maintaining all models trained during cross-validation on the final ensemble's predictive performance, and the base model's and resulting ensembles' variance and robustness across datasets and resamples. We find that augmenting the ensemble through the retention of all models trained provides a consistent and significant improvement, despite reductions in the reliability of the base models' performance estimates

Crossref

University of East Anglia digital repository

TreeGrad: Transferring Tree Ensembles to Neural Networks

Author: C Siu
DH Wolpert
F Pedregosa
JA Blackard
JH Friedman
K Nakai
L Breiman
SK Murthy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/12/2019
Field of study

Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

arXiv.org e-Print Archive

Crossref

Improving adaptive bagging methods for evolving data streams

Author: A. Bifet
F. Chu
F. Gustafsson
G. Hulten
J. Campo-Ávila del
J. Gama
L. Breiman
M. Basseville
N. Oza
P. Zhang
T. Mitchell
W.N. Street
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of different sizes, and ADWIN Bagging uses ADWIN as a change detector to decide when to discard underperforming ensemble members. We improve ADWIN Bagging using Hoeffding Adaptive Trees, trees that can adaptively learn from data streams that change over time. To speed up the time for adapting to change of Adaptive-Size Hoeffding Tree (ASHT) Bagging, we add an error change detector for each classifier. We test our improvements by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples

Crossref

Research Commons@Waikato

Computer aided diagnosis for cardiovascular diseases based on ECG signals : a survey

Author: Breiman L
E. Y. K. NG
Oliver F
OLIVER FAUST
Zweig MH
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 23/02/2016
Field of study

The interpretation of Electroencephalography (ECG) signals is difficult, because even subtle changes in the waveform can indicate a serious heart disease. Furthermore, these waveform changes might not be present all the time. As a consequence, it takes years of training for a medical practitioner to become an expert in ECG-based cardiovascular disease diagnosis. That training is a major investment in a specific skill. Even with expert ability, the signal interpretation takes time. In addition, human interpretation of ECG signals causes interoperator and intraoperator variability. ECG-based Computer-Aided Diagnosis (CAD) holds the promise of improving the diagnosis accuracy and reducing the cost. The same ECG signal will result in the same diagnosis support regardless of time and place. This paper introduces both the techniques used to realize the CAD functionality and the methods used to assess the established functionality. This survey aims to instill trust in CAD of cardiovascular diseases using ECG signals by introducing both a conceptional overview of the system and the necessary assessment method

Crossref

Sheffield Hallam University Research Archive

Random forests with random projections of the output space for high dimensional multi-label classification

Author: D. Achlioptas
D. Kocev
E.J. Candes
F. Pedregosa
G. Madjarov
G. Tsoumakas
G. Tsoumakas
J. Read
J.L. Faulon
L. Breiman
P. Geurts
W.B. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

Gaussian Fluctuation in Random Matrices

Author: A. Selberg
E. Basor
F. J. Dyson
G. Mahoux
H. J. Landau
H. L. Montgomery
J. Gunson
J. L. Lebowitz
Joel L. Lebowitz
L. Breiman
M. L. Mehta
M. Moshe
M. V. Berry
Ovidiu Costin
P. Bleher
R. Aurich
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/1994
Field of study

Let

N(L)

be the number of eigenvalues, in an interval of length

L

, of a matrix chosen at random from the Gaussian Orthogonal, Unitary or Symplectic ensembles of

{\cal N}

{\cal N}

matrices, in the limit

{\cal N}\rightarrow\infty

. We prove that

[N(L) - \langle N(L)\rangle]/\sqrt{\log L}

has a Gaussian distribution when

L\rightarrow\infty

. This theorem, which requires control of all the higher moments of the distribution, elucidates numerical and exact results on chaotic quantum systems and on the statistics of zeros of the Riemann zeta function. \noindent PACS nos. 05.45.+b, 03.65.-wComment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

A scalable expressive ensemble learning using Random Prism: a MapReduce approach

Author: B Panda
F Stahl
F Stahl
F Stahl
F Stahl
IH Witten
J Cendrowska
J Dean
J Han
JL Hennessy
K Hwang
L Breiman
L Breiman
MA Bramer
MA Bramer
R Tlili
RJ Quinlan
Y Grandvalet
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/03/2015
Field of study

The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system

Central Archive at the University of Reading

Crossref

Open Access Institutional Repository at Robert Gordon University

Variance-Based Feature Importance in Neural Networks

Author: BP Welford
F Pedregosa
G David Garson
JD Olden
L Breiman
M Paliwal
Publication venue: Springer
Publication date: 16/10/2019
Field of study

Crossref

University of Twente Research Information