396 research outputs found
Imagination Based Sample Construction for Zero-Shot Learning
Zero-shot learning (ZSL) which aims to recognize unseen classes with no
labeled training sample, efficiently tackles the problem of missing labeled
data in image retrieval. Nowadays there are mainly two types of popular methods
for ZSL to recognize images of unseen classes: probabilistic reasoning and
feature projection. Different from these existing types of methods, we propose
a new method: sample construction to deal with the problem of ZSL. Our proposed
method, called Imagination Based Sample Construction (IBSC), innovatively
constructs image samples of target classes in feature space by mimicking human
associative cognition process. Based on an association between attribute and
feature, target samples are constructed from different parts of various
samples. Furthermore, dissimilarity representation is employed to select
high-quality constructed samples which are used as labeled data to train a
specific classifier for those unseen classes. In this way, zero-shot learning
is turned into a supervised learning problem. As far as we know, it is the
first work to construct samples for ZSL thus, our work is viewed as a baseline
for future sample construction methods. Experiments on four benchmark datasets
show the superiority of our proposed method.Comment: Accepted as a short paper in ACM SIGIR 201
TagBook: A Semantic Video Representation without Supervision for Event Detection
We consider the problem of event detection in video for scenarios where only
few, or even zero examples are available for training. For this challenging
setting, the prevailing solutions in the literature rely on a semantic video
representation obtained from thousands of pre-trained concept detectors.
Different from existing work, we propose a new semantic video representation
that is based on freely available social tagged videos only, without the need
for training any intermediate concept detectors. We introduce a simple
algorithm that propagates tags from a video's nearest neighbors, similar in
spirit to the ones used for image retrieval, but redesign it for video event
detection by including video source set refinement and varying the video tag
assignment. We call our approach TagBook and study its construction,
descriptiveness and detection performance on the TRECVID 2013 and 2014
multimedia event detection datasets and the Columbia Consumer Video dataset.
Despite its simple nature, the proposed TagBook video representation is
remarkably effective for few-example and zero-example event detection, even
outperforming very recent state-of-the-art alternatives building on supervised
representations.Comment: accepted for publication as a regular paper in the IEEE Transactions
on Multimedi
Adaptive Tag Selection for Image Annotation
Not all tags are relevant to an image, and the number of relevant tags is
image-dependent. Although many methods have been proposed for image
auto-annotation, the question of how to determine the number of tags to be
selected per image remains open. The main challenge is that for a large tag
vocabulary, there is often a lack of ground truth data for acquiring optimal
cutoff thresholds per tag. In contrast to previous works that pre-specify the
number of tags to be selected, we propose in this paper adaptive tag selection.
The key insight is to divide the vocabulary into two disjoint subsets, namely a
seen set consisting of tags having ground truth available for optimizing their
thresholds and a novel set consisting of tags without any ground truth. Such a
division allows us to estimate how many tags shall be selected from the novel
set according to the tags that have been selected from the seen set. The
effectiveness of the proposed method is justified by our participation in the
ImageCLEF 2014 image annotation task. On a set of 2,065 test images with ground
truth available for 207 tags, the benchmark evaluation shows that compared to
the popular top- strategy which obtains an F-score of 0.122, adaptive tag
selection achieves a higher F-score of 0.223. Moreover, by treating the
underlying image annotation system as a black box, the new method can be used
as an easy plug-in to boost the performance of existing systems
3D Object Detection for Autonomous Driving: A Survey
Autonomous driving is regarded as one of the most promising remedies to
shield human beings from severe crashes. To this end, 3D object detection
serves as the core basis of such perception system especially for the sake of
path planning, motion prediction, collision avoidance, etc. Generally, stereo
or monocular images with corresponding 3D point clouds are already standard
layout for 3D object detection, out of which point clouds are increasingly
prevalent with accurate depth information being provided. Despite existing
efforts, 3D object detection on point clouds is still in its infancy due to
high sparseness and irregularity of point clouds by nature, misalignment view
between camera view and LiDAR bird's eye of view for modality synergies,
occlusions and scale variations at long distances, etc. Recently, profound
progress has been made in 3D object detection, with a large body of literature
being investigated to address this vision task. As such, we present a
comprehensive review of the latest progress in this field covering all the main
topics including sensors, fundamentals, and the recent state-of-the-art
detection methods with their pros and cons. Furthermore, we introduce metrics
and provide quantitative comparisons on popular public datasets. The avenues
for future work are going to be judiciously identified after an in-deep
analysis of the surveyed works. Finally, we conclude this paper.Comment: 3D object detection, Autonomous driving, Point cloud
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
- …