21 research outputs found
Binary Classification from Positive-Confidence Data
Can we learn a binary classifier from only positive data, without any
negative data or unlabeled data? We show that if one can equip positive data
with confidence (positive-confidence), one can successfully learn a binary
classifier, which we name positive-confidence (Pconf) classification. Our work
is related to one-class classification which is aimed at "describing" the
positive class by clustering-related methods, but one-class classification does
not have the ability to tune hyper-parameters and their aim is not on
"discriminating" positive and negative classes. For the Pconf classification
problem, we provide a simple empirical risk minimization framework that is
model-independent and optimization-independent. We theoretically establish the
consistency and an estimation error bound, and demonstrate the usefulness of
the proposed method for training deep neural networks through experiments.Comment: NeurIPS 2018 camera-ready version (this paper was selected for
spotlight presentation
Alternate Estimation of a Classifier and the Class-Prior from Positive and Unlabeled Data
We consider a problem of learning a binary classifier only from positive data
and unlabeled data (PU learning) and estimating the class-prior in unlabeled
data under the case-control scenario. Most of the recent methods of PU learning
require an estimate of the class-prior probability in unlabeled data, and it is
estimated in advance with another method. However, such a two-step approach
which first estimates the class prior and then trains a classifier may not be
the optimal approach since the estimation error of the class-prior is not taken
into account when a classifier is trained. In this paper, we propose a novel
unified approach to estimating the class-prior and training a classifier
alternately. Our proposed method is simple to implement and computationally
efficient. Through experiments, we demonstrate the practical usefulness of the
proposed method
Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data
Most positive and unlabeled data is subject to selection biases. The labeled
examples can, for example, be selected from the positive set because they are
easier to obtain or more obviously positive. This paper investigates how
learning can be ena BHbled in this setting. We propose and theoretically
analyze an empirical-risk-based method for incorporating the labeling
mechanism. Additionally, we investigate under which assumptions learning is
possible when the labeling mechanism is not fully understood and propose a
practical method to enable this. Our empirical analysis supports the
theoretical results and shows that taking into account the possibility of a
selection bias, even when the labeling mechanism is unknown, improves the
trained classifiers
Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags
Multiple instance learning (MIL) is a variation of traditional supervised
learning problems where data (referred to as bags) are composed of sub-elements
(referred to as instances) and only bag labels are available. MIL has a variety
of applications such as content-based image retrieval, text categorization and
medical diagnosis. Most of the previous work for MIL assume that the training
bags are fully labeled. However, it is often difficult to obtain an enough
number of labeled bags in practical situations, while many unlabeled bags are
available. A learning framework called PU learning (positive and unlabeled
learning) can address this problem. In this paper, we propose a convex PU
learning method to solve an MIL problem. We experimentally show that the
proposed method achieves better performance with significantly lower
computational costs than an existing method for PU-MIL
A Robust AUC Maximization Framework with Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification
The positive-unlabeled (PU) classification is a common scenario in real-world
applications such as healthcare, text classification, and bioinformatics, in
which we only observe a few samples labeled as "positive" together with a large
volume of "unlabeled" samples that may contain both positive and negative
samples. Building robust classifier for the PU problem is very challenging,
especially for complex data where the negative samples overwhelm and mislabeled
samples or corrupted features exist. To address these three issues, we propose
a robust learning framework that unifies AUC maximization (a robust metric for
biased labels), outlier detection (for excluding wrong labels), and feature
selection (for excluding corrupted features). The generalization error bounds
are provided for the proposed model that give valuable insight into the
theoretical performance of the method and lead to useful practical guidance,
e.g., to train a model, we find that the included unlabeled samples are
sufficient as long as the sample size is comparable to the number of positive
samples in the training process. Empirical comparisons and two real-world
applications on surgical site infection (SSI) and EEG seizure detection are
also conducted to show the effectiveness of the proposed model
Mixture Proportion Estimation via Kernel Embedding of Distributions
Mixture proportion estimation (MPE) is the problem of estimating the weight
of a component distribution in a mixture, given samples from the mixture and
component. This problem constitutes a key part in many "weakly supervised
learning" problems like learning with positive and unlabelled samples, learning
with label noise, anomaly detection and crowdsourcing. While there have been
several methods proposed to solve this problem, to the best of our knowledge no
efficient algorithm with a proven convergence rate towards the true proportion
exists for this problem. We fill this gap by constructing a provably correct
algorithm for MPE, and derive convergence rates under certain assumptions on
the distribution. Our method is based on embedding distributions onto an RKHS,
and implementing it only requires solving a simple convex quadratic programming
problem a few times. We run our algorithm on several standard classification
datasets, and demonstrate that it performs comparably to or better than other
algorithms on most datasets
Estimating the class prior and posterior from noisy positives and unlabeled data
We develop a classification algorithm for estimating posterior distributions
from positive-unlabeled data, that is robust to noise in the positive labels
and effective for high-dimensional data. In recent years, several algorithms
have been proposed to learn from positive-unlabeled data; however, many of
these contributions remain theoretical, performing poorly on real
high-dimensional data that is typically contaminated with noise. We build on
this previous work to develop two practical classification algorithms that
explicitly model the noise in the positive labels and utilize univariate
transforms built on discriminative classifiers. We prove that these univariate
transforms preserve the class prior, enabling estimation in the univariate
space and avoiding kernel density estimation for high-dimensional data. The
theoretical development and both parametric and nonparametric algorithms
proposed here constitutes an important step towards wide-spread use of robust
classification algorithms for positive-unlabeled data.Comment: Fixed a typo in the MSGMM update equations in the appendix. Other
minor change
Learning from Positive and Unlabeled Data under the Selected At Random Assumption
For many interesting tasks, such as medical diagnosis and web page
classification, a learner only has access to some positively labeled examples
and many unlabeled examples. Learning from this type of data requires making
assumptions about the true distribution of the classes and/or the mechanism
that was used to select the positive examples to be labeled. The commonly made
assumptions, separability of the classes and positive examples being selected
completely at random, are very strong. This paper proposes a weaker assumption
that assumes the positive examples to be selected at random, conditioned on
some of the attributes. To learn under this assumption, an EM method is
proposed. Experiments show that our method is not only very capable of learning
under this assumption, but it also outperforms the state of the art for
learning under the selected completely at random assumption
Nonparametric semi-supervised learning of class proportions
The problem of developing binary classifiers from positive and unlabeled data
is often encountered in machine learning. A common requirement in this setting
is to approximate posterior probabilities of positive and negative classes for
a previously unseen data point. This problem can be decomposed into two steps:
(i) the development of accurate predictors that discriminate between positive
and unlabeled data, and (ii) the accurate estimation of the prior probabilities
of positive and negative examples. In this work we primarily focus on the
latter subproblem. We study nonparametric class prior estimation and formulate
this problem as an estimation of mixing proportions in two-component mixture
models, given a sample from one of the components and another sample from the
mixture itself. We show that estimation of mixing proportions is generally
ill-defined and propose a canonical form to obtain identifiability while
maintaining the flexibility to model any distribution. We use insights from
this theory to elucidate the optimization surface of the class priors and
propose an algorithm for estimating them. To address the problems of
high-dimensional density estimation, we provide practical transformations to
low-dimensional spaces that preserve class priors. Finally, we demonstrate the
efficacy of our method on univariate and multivariate data
Identifying Different Definitions of Future in the Assessment of Future Economic Conditions: Application of PU Learning and Text Mining
The Economy Watcher Survey, which is a market survey published by the
Japanese government, contains \emph{assessments of current and future economic
conditions} by people from various fields. Although this survey provides
insights regarding economic policy for policymakers, a clear definition of the
word "future" in future economic conditions is not provided. Hence, the
assessments respondents provide in the survey are simply based on their
interpretations of the meaning of "future." This motivated us to reveal the
different interpretations of the future in their judgments of future economic
conditions by applying weakly supervised learning and text mining. In our
research, we separate the assessments of future economic conditions into
economic conditions of the near and distant future using learning from positive
and unlabeled data (PU learning). Because the dataset includes data from
several periods, we devised new architecture to enable neural networks to
conduct PU learning based on the idea of multi-task learning to efficiently
learn a classifier. Our empirical analysis confirmed that the proposed method
could separate the future economic conditions, and we interpreted the
classification results to obtain intuitions for policymaking