847 research outputs found
Single Shot Active Learning using Pseudo Annotators
Standard myopic active learning assumes that human annotations are always
obtainable whenever new samples are selected. This, however, is unrealistic in
many real-world applications where human experts are not readily available at
all times. In this paper, we consider the single shot setting: all the required
samples should be chosen in a single shot and no human annotation can be
exploited during the selection process. We propose a new method, Active
Learning through Random Labeling (ALRL), which substitutes single human
annotator for multiple, what we will refer to as, pseudo annotators. These
pseudo annotators always provide uniform and random labels whenever new
unlabeled samples are queried. This random labeling enables standard active
learning algorithms to also exhibit the exploratory behavior needed for single
shot active learning. The exploratory behavior is further enhanced by selecting
the most representative sample via minimizing nearest neighbor distance between
unlabeled samples and queried samples. Experiments on real-world datasets
demonstrate that the proposed method outperforms several state-of-the-art
approaches.Comment: 12 pages, 8 figure, submitted to Pattern Recognitio
ALWOD: Active Learning for Weakly-Supervised Object Detection
Object detection (OD), a crucial vision task, remains challenged by the lack
of large training datasets with precise object localization labels. In this
work, we propose ALWOD, a new framework that addresses this problem by fusing
active learning (AL) with weakly and semi-supervised object detection
paradigms. Because the performance of AL critically depends on the model
initialization, we propose a new auxiliary image generator strategy that
utilizes an extremely small labeled set, coupled with a large weakly tagged set
of images, as a warm-start for AL. We then propose a new AL acquisition
function, another critical factor in AL success, that leverages the
student-teacher OD pair disagreement and uncertainty to effectively propose the
most informative images to annotate. Finally, to complete the AL loop, we
introduce a new labeling task delegated to human annotators, based on selection
and correction of model-proposed detections, which is both rapid and effective
in labeling the informative images. We demonstrate, across several challenging
benchmarks, that ALWOD significantly narrows the gap between the ODs trained on
few partially labeled but strategically selected image instances and those that
rely on the fully-labeled data. Our code is publicly available on
https://github.com/seqam-lab/ALWOD.Comment: published in ICCV 202
Learning to Annotate Part Segmentation with Gradient Matching
The success of state-of-the-art deep neural networks heavily relies on the
presence of large-scale labelled datasets, which are extremely expensive and
time-consuming to annotate. This paper focuses on tackling semi-supervised part
segmentation tasks by generating high-quality images with a pre-trained GAN and
labelling the generated images with an automatic annotator. In particular, we
formulate the annotator learning as a learning-to-learn problem. Given a
pre-trained GAN, the annotator learns to label object parts in a set of
randomly generated images such that a part segmentation model trained on these
synthetic images with their predicted labels obtains low segmentation error on
a small validation set of manually labelled images. We further reduce this
nested-loop optimization problem to a simple gradient matching problem and
efficiently solve it with an iterative algorithm. We show that our method can
learn annotators from a broad range of labelled images including real images,
generated images, and even analytically rendered images. Our method is
evaluated with semi-supervised part segmentation tasks and significantly
outperforms other semi-supervised competitors when the amount of labelled
examples is extremely limited.Comment: ICLR 202
Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization
In long document controllable summarization, where labeled data is scarce,
pretrained models struggle to adapt to the task and effectively respond to user
queries. In this paper, we introduce Socratic pretraining, a question-driven,
unsupervised pretraining objective specifically designed to improve
controllability in summarization tasks. By training a model to generate and
answer relevant questions in a given context, Socratic pretraining enables the
model to more effectively adhere to user-provided queries and identify relevant
content to be summarized. We demonstrate the effectiveness of this approach
through extensive experimentation on two summarization domains, short stories
and dialogue, and multiple control strategies: keywords, questions, and factoid
QA pairs. Our pretraining method relies only on unlabeled documents and a
question generation system and outperforms pre-finetuning approaches that use
additional supervised data. Furthermore, our results show that Socratic
pretraining cuts task-specific labeled data requirements in half, is more
faithful to user-provided queries, and achieves state-of-the-art performance on
QMSum and SQuALITY.Comment: To appear at ACL 202
Semantic knowledge integration for learning from semantically imprecise data
Low availability of labeled training data often poses a fundamental limit to the accuracy of computer vision applications using machine learning methods. While these methods are improved continuously, e.g., through better neural network architectures, there cannot be a single methodical change that increases the accuracy on all possible tasks. This statement, known as the no free lunch theorem, suggests that we should consider aspects of machine learning other than learning algorithms for opportunities to escape the limits set by the available training data. In this thesis, we focus on two main aspects, namely the nature of the training data, where we introduce structure into the label set using concept hierarchies, and the learning paradigm, which we change in accordance with requirements of real-world applications as opposed to more academic setups.Concept hierarchies represent semantic relations, which are sets of statements such as "a bird is an animal." We propose a hierarchical classifier to integrate this domain knowledge in a pre-existing task, thereby increasing the information the classifier has access to. While the hierarchy's leaf nodes correspond to the original set of classes, the inner nodes are "new" concepts that do not exist in the original training data. However, we pose that such "imprecise" labels are valuable and should occur naturally, e.g., as an annotator's way of expressing their uncertainty. Furthermore, the increased number of concepts leads to more possible search terms when assembling a web-crawled dataset or using an image search. We propose CHILLAX, a method that learns from semantically imprecise training data, while still offering precise predictions to integrate seamlessly into a pre-existing application
Learning Person Re-identification Models from Videos with Weak Supervision
Most person re-identification methods, being supervised techniques, suffer
from the burden of massive annotation requirement. Unsupervised methods
overcome this need for labeled data, but perform poorly compared to the
supervised alternatives. In order to cope with this issue, we introduce the
problem of learning person re-identification models from videos with weak
supervision. The weak nature of the supervision arises from the requirement of
video-level labels, i.e. person identities who appear in the video, in contrast
to the more precise framelevel annotations. Towards this goal, we propose a
multiple instance attention learning framework for person re-identification
using such video-level labels. Specifically, we first cast the video person
re-identification task into a multiple instance learning setting, in which
person images in a video are collected into a bag. The relations between videos
with similar labels can be utilized to identify persons, on top of that, we
introduce a co-person attention mechanism which mines the similarity
correlations between videos with person identities in common. The attention
weights are obtained based on all person images instead of person tracklets in
a video, making our learned model less affected by noisy annotations. Extensive
experiments demonstrate the superiority of the proposed method over the related
methods on two weakly labeled person re-identification datasets
- …