8,237 research outputs found
Class Proportion Estimation with Application to Multiclass Anomaly Rejection
This work addresses two classification problems that fall under the heading
of domain adaptation, wherein the distributions of training and testing
examples differ. The first problem studied is that of class proportion
estimation, which is the problem of estimating the class proportions in an
unlabeled testing data set given labeled examples of each class. Compared to
previous work on this problem, our approach has the novel feature that it does
not require labeled training data from one of the classes. This property allows
us to address the second domain adaptation problem, namely, multiclass anomaly
rejection. Here, the goal is to design a classifier that has the option of
assigning a "reject" label, indicating that the instance did not arise from a
class present in the training data. We establish consistent learning strategies
for both of these domain adaptation problems, which to our knowledge are the
first of their kind. We also implement the class proportion estimation
technique and demonstrate its performance on several benchmark data sets.Comment: Accepted to AISTATS 2014. 15 pages. 2 figure
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Committee-Based Sample Selection for Probabilistic Classifiers
In many real-world learning tasks, it is expensive to acquire a sufficient
number of labeled examples for training. This paper investigates methods for
reducing annotation cost by `sample selection'. In this approach, during
training the learning program examines many unlabeled examples and selects for
labeling only those that are most informative at each stage. This avoids
redundantly labeling examples that contribute little new information. Our work
follows on previous research on Query By Committee, extending the
committee-based paradigm to the context of probabilistic classification. We
describe a family of empirical methods for committee-based sample selection in
probabilistic classification models, which evaluate the informativeness of an
example by measuring the degree of disagreement between several model variants.
These variants (the committee) are drawn randomly from a probability
distribution conditioned by the training set labeled so far. The method was
applied to the real-world natural language processing task of stochastic
part-of-speech tagging. We find that all variants of the method achieve a
significant reduction in annotation cost, although their computational
efficiency differs. In particular, the simplest variant, a two member committee
with no parameters to tune, gives excellent results. We also show that sample
selection yields a significant reduction in the size of the model used by the
tagger
- …