8 research outputs found

    A FLEXIBLE PROCEDURE FOR POSITIVE–UNLABELED LEARNING & PERIODS ESTIMATION FOR MIRAS USING MULTI-BAND LIGHT CURVES AND INVERSE PERIOD-LUMINOSITY RELATIONS

    Get PDF
    This dissertation contains two independent projects: the first project develops a general methodology for solving the Positive–Unlabeled (PU) learning problem, and the second project creates a hierarchical Bayesian model that solves a specific astronomical problem – periods estimation for Miras. In the first project, we deal with the PU learning which considers two samples, a positive set P with observations from only one class and an unlabeled set U with observations from two classes. The goal is to classify observations in U. Class mixture proportion estimation (MPE) in U is a key step in PU learning. Blanchard et al. (2010) show that MPE in PU learning is a generalization of the problem of estimating the proportion of true null hypotheses in multiple testing problems. Motivated by this idea, we propose a flexible framework: fistly reduce the problem to one dimension via construction of a probabilistic classifier trained on the P and U data sets, and then apply a one–dimensional mixture proportion method to the observation class probabilities. The flexibility of this framework lies in the freedom to choose the classifier and the one–dimensional MPE method. Using this framework, we propose two mixture proportion estimators: one adapts ROC technique (Storey, 2002; Scott, 2015), and another adapts isotonic regression (Patra and Sen, 2015). Theoretically we prove the consistency of these two estimators. Empirically we demonstrate that our proposed estimators have competitive performance on simulated waveform data and a protein signaling problem. And the implementations of our estimators are tuning parameter free. The second project of this dissertation is to present an inverse Period-Luminosity relation (PLR) enhanced multi-band semi-parametric model (SP3) to efficiently recover periods for quasiperiodic variable stars such as Miras. Mira variables are promising distance indicators because the oxygen-rich type Miras follow a tight PLR in the near-infrared. However, the Mira light curves are quasi-periodic, making their period estimation significantly challenging. In recent few years, several methods have been developed to estimate period for Miras. He et al. (2016) develop a single-band semi-parametric model based on the Gaussian processes tool. Yuan et al. (2018) extend the above model to a multi-band case. These two models are designed for fitting observations for single Mira (single-band or multi-band) and do not use the PLR. To borrow the strength across light curves, our proposed SP3 model uses inverse Period-Luminosity relation (iPLR) to adaptively feed a frequency prior to each light curve. This model outperforms existing methods in various simulated data sets

    Weakly supervised learning via statistical sufficiency

    No full text
    The Thesis introduces a novel algorithmic framework for weakly supervised learn- ing, namely, for any any problem in between supervised and unsupervised learning, from the labels standpoint. Weak supervision is the reality in many applications of machine learning where training is performed with partially missing, aggregated- level and/or noisy labels. The approach is grounded on the concept of statistical suf- ficiency and its transposition to loss functions. Our solution is problem-agnostic yet constructive as it boils down to a simple two-steps procedure. First, estimate a suffi- cient statistic for the labels from weak supervision. Second, plug the estimate into a (newly defined) linear-odd loss function and learn the model by any gradient-based solver, with a simple adaptation. We apply the same approach to several challeng- ing learning problems: (i) learning from label proportions, (ii) learning with noisy labels for both linear classifiers and deep neural networks, and (iii) learning from feature-wise distributed datasets where the entity matching function is unknown
    corecore