1 research outputs found
Quantifying With Only Positive Training Data
Quantification is the research field that studies methods for counting the
number of data points that belong to each class in an unlabeled sample.
Traditionally, researchers in this field assume the availability of labelled
observations for all classes to induce a quantification model. However, we
often face situations where the number of classes is large or even unknown, or
we have reliable data for a single class. When inducing a multi-class
quantifier is infeasible, we are often concerned with estimates for a specific
class of interest. In this context, we have proposed a novel setting known as
One-class Quantification (OCQ). In contrast, Positive and Unlabeled Learning
(PUL), another branch of Machine Learning, has offered solutions to OCQ,
despite quantification not being the focal point of PUL. This article closes
the gap between PUL and OCQ and brings both areas together under a unified
view. We compare our method, Passive Aggressive Threshold (PAT), against PUL
methods and show that PAT generally is the fastest and most accurate algorithm.
PAT induces quantification models that can be reused to quantify different
samples of data. We additionally introduce Exhaustive TIcE (ExTIcE), an
improved version of the PUL algorithm Tree Induction for c Estimation (TIcE).
We show that ExTIcE quantifies more accurately than PAT and the other assessed
algorithms in scenarios where several negative observations are identical to
the positive ones