9,455 research outputs found
Weakly supervised segment annotation via expectation kernel density estimation
Since the labelling for the positive images/videos is ambiguous in weakly
supervised segment annotation, negative mining based methods that only use the
intra-class information emerge. In these methods, negative instances are
utilized to penalize unknown instances to rank their likelihood of being an
object, which can be considered as a voting in terms of similarity. However,
these methods 1) ignore the information contained in positive bags, 2) only
rank the likelihood but cannot generate an explicit decision function. In this
paper, we propose a voting scheme involving not only the definite negative
instances but also the ambiguous positive instances to make use of the extra
useful information in the weakly labelled positive bags. In the scheme, each
instance votes for its label with a magnitude arising from the similarity, and
the ambiguous positive instances are assigned soft labels that are iteratively
updated during the voting. It overcomes the limitations of voting using only
the negative bags. We also propose an expectation kernel density estimation
(eKDE) algorithm to gain further insight into the voting mechanism.
Experimental results demonstrate the superiority of our scheme beyond the
baselines.Comment: 9 pages, 2 figure
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
Recommended from our members
Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval.
Describing visual image contents by semantic concepts is an effective and straightforward way to facilitate various high level applications. Inferring semantic concepts from low-level pictorial feature analysis is challenging due to the semantic gap problem, while manually labeling concepts is unwise because of a large number of images in both online and offline collections. In this paper, we present a novel approach to automatically generate intermediate image descriptors by exploiting concept co-occurrence patterns in the pre-labeled training set that renders it possible to depict complex scene images semantically. Our work is motivated by the fact that multiple concepts that frequently co-occur across images form patterns which could provide contextual cues for individual concept inference. We discover the co-occurrence patterns as hierarchical communities by graph modularity maximization in a network with nodes and edges representing concepts and co-occurrence relationships separately. A random walk process working on the inferred concept probabilities with the discovered co-occurrence patterns is applied to acquire the refined concept signature representation. Through experiments in automatic image annotation and semantic image retrieval on several challenging datasets, we demonstrate the effectiveness of the proposed concept co-occurrence patterns as well as the concept signature representation in comparison with state-of-the-art approaches
DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
In this paper, we study the task of detecting semantic parts of an object,
e.g., a wheel of a car, under partial occlusion. We propose that all models
should be trained without seeing occlusions while being able to transfer the
learned knowledge to deal with occlusions. This setting alleviates the
difficulty in collecting an exponentially large dataset to cover occlusion
patterns and is more essential. In this scenario, the proposal-based deep
networks, like RCNN-series, often produce unsatisfactory results, because both
the proposal extraction and classification stages may be confused by the
irrelevant occluders. To address this, [25] proposed a voting mechanism that
combines multiple local visual cues to detect semantic parts. The semantic
parts can still be detected even though some visual cues are missing due to
occlusions. However, this method is manually-designed, thus is hard to be
optimized in an end-to-end manner.
In this paper, we present DeepVoting, which incorporates the robustness shown
by [25] into a deep network, so that the whole pipeline can be jointly
optimized. Specifically, it adds two layers after the intermediate features of
a deep network, e.g., the pool-4 layer of VGGNet. The first layer extracts the
evidence of local visual cues, and the second layer performs a voting mechanism
by utilizing the spatial relationship between visual cues and semantic parts.
We also propose an improved version DeepVoting+ by learning visual cues from
context outside objects. In experiments, DeepVoting achieves significantly
better performance than several baseline methods, including Faster-RCNN, for
semantic part detection under occlusion. In addition, DeepVoting enjoys
explainability as the detection results can be diagnosed via looking up the
voting cues
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
- …