3 research outputs found
Detecting Quasars in Large-Scale Astronomical Surveys
We present a classification-based approach to identify quasi-stellar radio
sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance
on a manually labeled training set. While reasonable results can already be
obtained via approaches working only on photometric data, our experiments
indicate that simple but problem-specific features extracted from spectroscopic
data can significantly improve the classification performance. Since our
approach works orthogonal to existing classification schemes used for building
the spectroscopic catalogs, our classification results are well suited for a
mutual assessment of the approaches' accuracies.Comment: 6 pages, 8 figures, published in proceedings of 2010 Ninth
International Conference on Machine Learning and Applications (ICMLA) of the
IEE
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Stratified Learning: a general-purpose statistical method for improved learning under Covariate Shift
Covariate shift arises when the labelled training (source) data is not
representative of the unlabelled (target) data due to systematic differences in
the covariate distributions. A supervised model trained on the source data
subject to covariate shift may suffer from poor generalization on the target
data. We propose a novel, statistically principled and theoretically justified
method to improve learning under covariate shift conditions, based on
propensity score stratification, a well-established methodology in causal
inference. We show that the effects of covariate shift can be reduced or
altogether eliminated by conditioning on propensity scores. In practice, this
is achieved by fitting learners on subgroups ("strata") constructed by
partitioning the data based on the estimated propensity scores, leading to
balanced covariates and much-improved target prediction. We demonstrate the
effectiveness of our general-purpose method on contemporary research questions
in observational cosmology, and on additional benchmark examples, matching or
outperforming state-of-the-art importance weighting methods, widely studied in
the covariate shift literature. We obtain the best reported AUC (0.958) on the
updated "Supernovae photometric classification challenge" and improve upon
existing conditional density estimation of galaxy redshift from Sloan Data Sky
Survey (SDSS) data