280,124 research outputs found
Multi-Instance Multi-Label Learning
In this paper, we propose the MIML (Multi-Instance Multi-Label learning)
framework where an example is described by multiple instances and associated
with multiple class labels. Compared to traditional learning frameworks, the
MIML framework is more convenient and natural for representing complicated
objects which have multiple semantic meanings. To learn from MIML examples, we
propose the MimlBoost and MimlSvm algorithms based on a simple degeneration
strategy, and experiments show that solving problems involving complicated
objects with multiple semantic meanings in the MIML framework can lead to good
performance. Considering that the degeneration process may lose information, we
propose the D-MimlSvm algorithm which tackles MIML problems directly in a
regularization framework. Moreover, we show that even when we do not have
access to the real objects and thus cannot capture more information from real
objects by using the MIML representation, MIML is still useful. We propose the
InsDif and SubCod algorithms. InsDif works by transforming single-instances
into the MIML representation for learning, while SubCod works by transforming
single-label examples into the MIML representation for learning. Experiments
show that in some tasks they are able to achieve better performance than
learning the single-instances or single-label examples directly.Comment: 64 pages, 10 figures; Artificial Intelligence, 201
Multi-Label Learning with Label Enhancement
The task of multi-label learning is to predict a set of relevant labels for
the unseen instance. Traditional multi-label learning algorithms treat each
class label as a logical indicator of whether the corresponding label is
relevant or irrelevant to the instance, i.e., +1 represents relevant to the
instance and -1 represents irrelevant to the instance. Such label represented
by -1 or +1 is called logical label. Logical label cannot reflect different
label importance. However, for real-world multi-label learning problems, the
importance of each possible label is generally different. For the real
applications, it is difficult to obtain the label importance information
directly. Thus we need a method to reconstruct the essential label importance
from the logical multilabel data. To solve this problem, we assume that each
multi-label instance is described by a vector of latent real-valued labels,
which can reflect the importance of the corresponding labels. Such label is
called numerical label. The process of reconstructing the numerical labels from
the logical multi-label data via utilizing the logical label information and
the topological structure in the feature space is called Label Enhancement. In
this paper, we propose a novel multi-label learning framework called LEMLL,
i.e., Label Enhanced Multi-Label Learning, which incorporates regression of the
numerical labels and label enhancement into a unified framework. Extensive
comparative studies validate that the performance of multi-label learning can
be improved significantly with label enhancement and LEMLL can effectively
reconstruct latent label importance information from logical multi-label data.Comment: ICDM 201
Graph based Label Enhancement for Multi-instance Multi-label learning
Multi-instance multi-label (MIML) learning is widely applicated in numerous
domains, such as the image classification where one image contains multiple
instances correlated with multiple logic labels simultaneously. The related
labels in existing MIML are all assumed as logical labels with equal
significance. However, in practical applications in MIML, significance of each
label for multiple instances per bag (such as an image) is significant
different. Ignoring labeling significance will greatly lose the semantic
information of the object, so that MIML is not applicable in complex scenes
with a poor learning performance. To this end, this paper proposed a novel MIML
framework based on graph label enhancement, namely GLEMIML, to improve the
classification performance of MIML by leveraging label significance. GLEMIML
first recognizes the correlations among instances by establishing the graph and
then migrates the implicit information mined from the feature space to the
label space via nonlinear mapping, thus recovering the label significance.
Finally, GLEMIML is trained on the enhanced data through matching and
interaction mechanisms. GLEMIML (AvgRank: 1.44) can effectively improve the
performance of MIML by mining the label distribution mechanism and show better
results than the SOTA method (AvgRank: 2.92) on multiple benchmark datasets.Comment: 7 pages,2 figure
Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning
In many real-world tasks, the concerned objects can be represented as a
multi-instance bag associated with a candidate label set, which consists of one
ground-truth label and several false positive labels. Multi-instance
partial-label learning (MIPL) is a learning paradigm to deal with such tasks
and has achieved favorable performances. Existing MIPL approach follows the
instance-space paradigm by assigning augmented candidate label sets of bags to
each instance and aggregating bag-level labels from instance-level labels.
However, this scheme may be suboptimal as global bag-level information is
ignored and the predicted labels of bags are sensitive to predictions of
negative instances. In this paper, we study an alternative scheme where a
multi-instance bag is embedded into a single vector representation.
Accordingly, an intuitive algorithm named DEMIPL, i.e., Disambiguated attention
Embedding for Multi-Instance Partial-Label learning, is proposed. DEMIPL
employs a disambiguation attention mechanism to aggregate a multi-instance bag
into a single vector representation, followed by a momentum-based
disambiguation strategy to identify the ground-truth label from the candidate
label set. Furthermore, we introduce a real-world MIPL dataset for colorectal
cancer classification. Experimental results on benchmark and real-world
datasets validate the superiority of DEMIPL against the compared MIPL and
partial-label learning approaches.Comment: Accepted at NeurIPS 202
Large-Scale Multi-Label Learning with Incomplete Label Assignments
Multi-label learning deals with the classification problems where each
instance can be assigned with multiple labels simultaneously. Conventional
multi-label learning approaches mainly focus on exploiting label correlations.
It is usually assumed, explicitly or implicitly, that the label sets for
training instances are fully labeled without any missing labels. However, in
many real-world multi-label datasets, the label assignments for training
instances can be incomplete. Some ground-truth labels can be missed by the
labeler from the label set. This problem is especially typical when the number
instances is very large, and the labeling cost is very high, which makes it
almost impossible to get a fully labeled training set. In this paper, we study
the problem of large-scale multi-label learning with incomplete label
assignments. We propose an approach, called MPU, based upon positive and
unlabeled stochastic gradient descent and stacked models. Unlike prior works,
our method can effectively and efficiently consider missing labels and label
correlations simultaneously, and is very scalable, that has linear time
complexities over the size of the data. Extensive experiments on two real-world
multi-label datasets show that our MPU model consistently outperform other
commonly-used baselines
- …