Search CORE

3,589 research outputs found

Improved Weighted Random Forest for Classification Problems

Author: A Booth
A Cielen
DH Wolpert
G Brown
G James
H Byeon
H Kim
H Pham
HK Hong
IC Yeh
JP Donate
L Breiman
L Breiman
LI Kuncheva
LI Kuncheva
LV Utkin
M Sunil Babu
MK Yöntem
N Hooda
P Peykani
P Peykani
P Peykani
P Peykani
R Alizadehsani
RJ Lyon
S Moro
SJ Winham
Z Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting a random number of features as well. This has made the random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on ac-curacy, optimal weighted random forest based on the area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data

Author: Arien Crellin-Quick
Bailey
Ball
Barning
Blockeel
Blomme
Breiman
Breiman
Burman
Butler
Cesa-Bianchi
Cheeseman
Covey
Dan L. Starr
Doering
Eyer
Eyer
Eyer
Eyer
Flores
Freund
Friedman
Hastie
Ivezić
John M. Brewer
Joseph W. Richards
Joshua S. Bloom
Justin Higgins
Knerr
LSST Science Collaborations .
Maxime Rischard
Millan-Gabet
Moffat
Nathaniel R. Butler
O'Keefe
Perryman
Press
Quinlan
Rachel Kennedy
Rebbapragada
Sesar
Stankov
Suchkov
Udalski
Vapnik
Walkowicz
Wasserman
Willemsen
Woźniak
Wu
Publication venue: 'IOP Publishing'
Publication date: 10/01/2011
Field of study

With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.Comment: 23 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Anomalous pattern based clustering of mental tasks with subject independent learning – some preliminary results

Author: Amorim Renato
Gan John Q
Mirkin Boris
Publication venue: 'Sciedu Press'
Publication date: 01/09/2012
Field of study

In this paper we describe a new method for EEG signal classification in which the classification of one subject’s EEG signals is based on features learnt from another subject. This method applies to the power spectrum density data and assigns class-dependent information weights to individual features. The informative features appear to be rather similar among different subjects, thus supporting the view that there are subject independent general brain patterns for the same mental task. Classification is done via clustering using the intelligent k-means algorithm with the most informative features from a different subject. We experimentally compare our method with others.</jats:p

University of Essex Research Repository

Crossref

University of Hertfordshire Research Archive

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests

Author: Joshua Huang
Mark Li
Qingyao Wu
Thanh-Tung Nguyen
Thuy Nguyen
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector