3 research outputs found

    Detecting Quasars in Large-Scale Astronomical Surveys

    Full text link
    We present a classification-based approach to identify quasi-stellar radio sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance on a manually labeled training set. While reasonable results can already be obtained via approaches working only on photometric data, our experiments indicate that simple but problem-specific features extracted from spectroscopic data can significantly improve the classification performance. Since our approach works orthogonal to existing classification schemes used for building the spectroscopic catalogs, our classification results are well suited for a mutual assessment of the approaches' accuracies.Comment: 6 pages, 8 figures, published in proceedings of 2010 Ninth International Conference on Machine Learning and Applications (ICMLA) of the IEE

    A review of domain adaptation without target labels

    Full text link
    Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

    Stratified Learning: a general-purpose statistical method for improved learning under Covariate Shift

    Get PDF
    Covariate shift arises when the labelled training (source) data is not representative of the unlabelled (target) data due to systematic differences in the covariate distributions. A supervised model trained on the source data subject to covariate shift may suffer from poor generalization on the target data. We propose a novel, statistically principled and theoretically justified method to improve learning under covariate shift conditions, based on propensity score stratification, a well-established methodology in causal inference. We show that the effects of covariate shift can be reduced or altogether eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners on subgroups ("strata") constructed by partitioning the data based on the estimated propensity scores, leading to balanced covariates and much-improved target prediction. We demonstrate the effectiveness of our general-purpose method on contemporary research questions in observational cosmology, and on additional benchmark examples, matching or outperforming state-of-the-art importance weighting methods, widely studied in the covariate shift literature. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge" and improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data
    corecore