239,668 research outputs found
On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data
With the coming data deluge from synoptic surveys, there is a growing need
for frameworks that can quickly and automatically produce calibrated
classification probabilities for newly-observed variables based on a small
number of time-series measurements. In this paper, we introduce a methodology
for variable-star classification, drawing from modern machine-learning
techniques. We describe how to homogenize the information gleaned from light
curves by selection and computation of real-numbered metrics ("feature"),
detail methods to robustly estimate periodic light-curve features, introduce
tree-ensemble methods for accurate variable star classification, and show how
to rigorously evaluate the classification results using cross validation. On a
25-class data set of 1542 well-studied variable stars, we achieve a 22.8%
overall classification error using the random forest classifier; this
represents a 24% improvement over the best previous classifier on these data.
This methodology is effective for identifying samples of specific science
classes: for pulsational variables used in Milky Way tomography we obtain a
discovery efficiency of 98.2% and for eclipsing systems we find an efficiency
of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is
superior to other machine-learned methods in terms of accuracy, speed, and
relative immunity to features with no useful class information; the RF
classifier can also be used to estimate the importance of each feature in
classification. Additionally, we present the first astronomical use of
hierarchical classification methods to incorporate a known class taxonomy in
the classifier, which further reduces the catastrophic error rate to 7.8%.
Excluding low-amplitude sources, our overall error rate improves to 14%, with a
catastrophic error rate of 3.5%.Comment: 23 pages, 9 figure
Nearest Labelset Using Double Distances for Multi-label Classification
Multi-label classification is a type of supervised learning where an instance
may belong to multiple labels simultaneously. Predicting each label
independently has been criticized for not exploiting any correlation between
labels. In this paper we propose a novel approach, Nearest Labelset using
Double Distances (NLDD), that predicts the labelset observed in the training
data that minimizes a weighted sum of the distances in both the feature space
and the label space to the new instance. The weights specify the relative
tradeoff between the two distances. The weights are estimated from a binomial
regression of the number of misclassified labels as a function of the two
distances. Model parameters are estimated by maximum likelihood. NLDD only
considers labelsets observed in the training data, thus implicitly taking into
account label dependencies. Experiments on benchmark multi-label data sets show
that the proposed method on average outperforms other well-known approaches in
terms of Hamming loss, 0/1 loss, and multi-label accuracy and ranks second
after ECC on the F-measure
Object Classification in Astronomical Multi-Color Surveys
We present a photometric method for identifying stars, galaxies and quasars
in multi-color surveys, which uses a library of >65000 color templates. The
method aims for extracting the information content of object colors in a
statistically correct way and performs a classification as well as a redshift
estimation for galaxies and quasars in a unified approach. For the redshift
estimation, we use an advanced version of the MEV estimator which determines
the redshift error from the redshift dependent probability density function.
The method was originally developed for the CADIS survey, where we checked
its performance by spectroscopy. The method provides high reliability (6 errors
among 151 objects with R<24), especially for quasar selection, and redshifts
accurate within sigma ~ 0.03 for galaxies and sigma ~ 0.1 for quasars.
We compare a few model surveys using the same telescope time but different
sets of broad-band and medium-band filters. Their performance is investigated
by Monte-Carlo simulations as well as by analytic evaluation in terms of
classification and redshift estimation. In practice, medium-band surveys show
superior performance. Finally, we discuss the relevance of color calibration
and derive important conclusions for the issues of library design and choice of
filters. The calibration accuracy poses strong constraints on an accurate
classification, and is most critical for surveys with few, broad and deeply
exposed filters, but less severe for many, narrow and less deep filters.Comment: 21 pages including 10 figures. Accepted for publication in Astronomy
& Astrophysic
- …