Search CORE

3,584 research outputs found

On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data

Author: Arien Crellin-Quick
Bailey
Ball
Barning
Blockeel
Blomme
Breiman
Breiman
Burman
Butler
Cesa-Bianchi
Cheeseman
Covey
Dan L. Starr
Doering
Eyer
Eyer
Eyer
Eyer
Flores
Freund
Friedman
Hastie
Ivezić
John M. Brewer
Joseph W. Richards
Joshua S. Bloom
Justin Higgins
Knerr
LSST Science Collaborations .
Maxime Rischard
Millan-Gabet
Moffat
Nathaniel R. Butler
O'Keefe
Perryman
Press
Quinlan
Rachel Kennedy
Rebbapragada
Sesar
Stankov
Suchkov
Udalski
Vapnik
Walkowicz
Wasserman
Willemsen
Woźniak
Wu
Publication venue: 'IOP Publishing'
Publication date: 10/01/2011
Field of study

With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.Comment: 23 pages, 9 figure

arXiv.org e-Print Archive

Identifying hidden contexts

Author: Zliobaite Indre
Publication venue: Springer LNAI
Publication date: 24/05/2011
Field of study

In this study we investigate how to identify hidden contexts from the data in classification tasks. Contexts are artifacts in the data, which do not predict the class label directly. For instance, in speech recognition task speakers might have different accents, which do not directly discriminate between the spoken words. Identifying hidden contexts is considered as data preprocessing task, which can help to build more accurate classifiers, tailored for particular contexts and give an insight into the data structure. We present three techniques to identify hidden contexts, which hide class label information from the input data and partition it using clustering techniques. We form a collection of performance measures to ensure that the resulting contexts are valid. We evaluate the performance of the proposed techniques on thirty real datasets. We present a case study illustrating how the identified contexts can be used to build specialized more accurate classifiers

Bournemouth University Research Online