239,668 research outputs found

    On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data

    Full text link
    With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.Comment: 23 pages, 9 figure

    Nearest Labelset Using Double Distances for Multi-label Classification

    Full text link
    Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this paper we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Experiments on benchmark multi-label data sets show that the proposed method on average outperforms other well-known approaches in terms of Hamming loss, 0/1 loss, and multi-label accuracy and ranks second after ECC on the F-measure

    Object Classification in Astronomical Multi-Color Surveys

    Get PDF
    We present a photometric method for identifying stars, galaxies and quasars in multi-color surveys, which uses a library of >65000 color templates. The method aims for extracting the information content of object colors in a statistically correct way and performs a classification as well as a redshift estimation for galaxies and quasars in a unified approach. For the redshift estimation, we use an advanced version of the MEV estimator which determines the redshift error from the redshift dependent probability density function. The method was originally developed for the CADIS survey, where we checked its performance by spectroscopy. The method provides high reliability (6 errors among 151 objects with R<24), especially for quasar selection, and redshifts accurate within sigma ~ 0.03 for galaxies and sigma ~ 0.1 for quasars. We compare a few model surveys using the same telescope time but different sets of broad-band and medium-band filters. Their performance is investigated by Monte-Carlo simulations as well as by analytic evaluation in terms of classification and redshift estimation. In practice, medium-band surveys show superior performance. Finally, we discuss the relevance of color calibration and derive important conclusions for the issues of library design and choice of filters. The calibration accuracy poses strong constraints on an accurate classification, and is most critical for surveys with few, broad and deeply exposed filters, but less severe for many, narrow and less deep filters.Comment: 21 pages including 10 figures. Accepted for publication in Astronomy & Astrophysic
    corecore