2,409 research outputs found
Large-Scale Multi-Label Learning with Incomplete Label Assignments
Multi-label learning deals with the classification problems where each
instance can be assigned with multiple labels simultaneously. Conventional
multi-label learning approaches mainly focus on exploiting label correlations.
It is usually assumed, explicitly or implicitly, that the label sets for
training instances are fully labeled without any missing labels. However, in
many real-world multi-label datasets, the label assignments for training
instances can be incomplete. Some ground-truth labels can be missed by the
labeler from the label set. This problem is especially typical when the number
instances is very large, and the labeling cost is very high, which makes it
almost impossible to get a fully labeled training set. In this paper, we study
the problem of large-scale multi-label learning with incomplete label
assignments. We propose an approach, called MPU, based upon positive and
unlabeled stochastic gradient descent and stacked models. Unlike prior works,
our method can effectively and efficiently consider missing labels and label
correlations simultaneously, and is very scalable, that has linear time
complexities over the size of the data. Extensive experiments on two real-world
multi-label datasets show that our MPU model consistently outperform other
commonly-used baselines
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
A growing number of applications, e.g. video surveillance and medical image
analysis, require training recognition systems from large amounts of weakly
annotated data while some targeted interactions with a domain expert are
allowed to improve the training process. In such cases, active learning (AL)
can reduce labeling costs for training a classifier by querying the expert to
provide the labels of most informative instances. This paper focuses on AL
methods for instance classification problems in multiple instance learning
(MIL), where data is arranged into sets, called bags, that are weakly labeled.
Most AL methods focus on single instance learning problems. These methods are
not suitable for MIL problems because they cannot account for the bag structure
of data. In this paper, new methods for bag-level aggregation of instance
informativeness are proposed for multiple instance active learning (MIAL). The
\textit{aggregated informativeness} method identifies the most informative
instances based on classifier uncertainty, and queries bags incorporating the
most information. The other proposed method, called \textit{cluster-based
aggregative sampling}, clusters data hierarchically in the instance space. The
informativeness of instances is assessed by considering bag labels, inferred
instance labels, and the proportion of labels that remain to be discovered in
clusters. Both proposed methods significantly outperform reference methods in
extensive experiments using benchmark data from several application domains.
Results indicate that using an appropriate strategy to address MIAL problems
yields a significant reduction in the number of queries needed to achieve the
same level of performance as single instance AL methods
- …