244,302 research outputs found
Learning from Partial Labels
We address the problem of partially-labeled multiclass classification, where instead of a single label per instance, the algorithm is given a candidate set of labels, only one of which is correct. Our setting is motivated by a common scenario in many image and video collections, where only partial access to labels is available. The goal is to learn a classifier that can disambiguate the partially-labeled training instances, and generalize to unseen data. We define an intuitive property of the data distribution that sharply characterizes the ability to learn in this setting and show that effective learning is possible even when all the data is only partially labeled. Exploiting this property of the data, we propose a convex learning formulation based on minimization of a loss function appropriate for the partial label setting. We analyze the conditions under which our loss function is asymptotically consistent, as well as its generalization and transductive performance. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies; in particular, we annotated and experimented on a very large video data set and achieve 6% error for character naming on 16 episodes of the TV series Lost
Learning from Partial Labels with Minimum Entropy
This paper introduces the minimum entropy regularizer for learning from partial labels. This learning problem encompasses the semi-supervised setting, where a decision rule is to be learned from labeled and unlabeled examples. The minimum entropy regularizer applies to diagnosis models, i.e. models of the posterior probabilities of classes. It is shown to include other approaches to the semi-supervised problem as particular or limiting cases. A series of experiments illustrates that the proposed criterion provides solutions taking advantage of unlabeled examples when the latter convey information. Even when the data are sampled from the distribution class spanned by a generative model, the proposed approach improves over the estimated generative model when the number of features is of the order of sample size. The performances are definitely in favor of minimum entropy when the generative model is slightly misspecified. Finally, the robustness of the learning scheme is demonstrated: in situations where unlabeled examples do not convey information, minimum entropy returns a solution discarding unlabeled examples and performs as well as supervised learning. Cet article introduit le régularisateur à entropie minimum pour l'apprentissage d'étiquettes partielles. Ce problème d'apprentissage incorpore le cadre non supervisé, où une règle de décision doit être apprise à partir d'exemples étiquetés et non étiquetés. Le régularisateur à entropie minimum s'applique aux modèles de diagnostics, c'est-à-dire aux modèles des probabilités postérieures de classes. Nous montrons comment inclure d'autres approches comme un cas particulier ou limité du problème semi-supervisé. Une série d'expériences montrent que le critère proposé fournit des solutions utilisant les exemples non étiquetés lorsque ces dernières sont instructives. Même lorsque les données sont échantillonnées à partir de la classe de distribution balayée par un modèle génératif, l'approche mentionnée améliore le modèle génératif estimé lorsque le nombre de caractéristiques est de l'ordre de la taille de l'échantillon. Les performances avantagent certainement l'entropie minimum lorsque le modèle génératif est légèrement mal spécifié. Finalement, la robustesse de ce cadre d'apprentissage est démontré : lors de situations où les exemples non étiquetés n'apportent aucune information, l'entropie minimum retourne une solution rejetant les exemples non étiquetés et est aussi performante que l'apprentissage supervisé.discriminant learning, semi-supervised learning, minimum entropy, apprentissage discriminant, apprentissage semi-supervisé, entropie minimum
Robust Representation Learning for Unreliable Partial Label Learning
Partial Label Learning (PLL) is a type of weakly supervised learning where
each training instance is assigned a set of candidate labels, but only one
label is the ground-truth. However, this idealistic assumption may not always
hold due to potential annotation inaccuracies, meaning the ground-truth may not
be present in the candidate label set. This is known as Unreliable Partial
Label Learning (UPLL) that introduces an additional complexity due to the
inherent unreliability and ambiguity of partial labels, often resulting in a
sub-optimal performance with existing methods. To address this challenge, we
propose the Unreliability-Robust Representation Learning framework (URRL) that
leverages unreliability-robust contrastive learning to help the model fortify
against unreliable partial labels effectively. Concurrently, we propose a dual
strategy that combines KNN-based candidate label set correction and
consistency-regularization-based label disambiguation to refine label quality
and enhance the ability of representation learning within the URRL framework.
Extensive experiments demonstrate that the proposed method outperforms
state-of-the-art PLL methods on various datasets with diverse degrees of
unreliability and ambiguity. Furthermore, we provide a theoretical analysis of
our approach from the perspective of the expectation maximization (EM)
algorithm. Upon acceptance, we pledge to make the code publicly accessible
- …