29,895 research outputs found
Learning Deep NBNN Representations for Robust Place Categorization
This paper presents an approach for semantic place categorization using data
obtained from RGB cameras. Previous studies on visual place recognition and
classification have shown that, by considering features derived from
pre-trained Convolutional Neural Networks (CNNs) in combination with part-based
classification models, high recognition accuracy can be achieved, even in
presence of occlusions and severe viewpoint changes. Inspired by these works,
we propose to exploit local deep representations, representing images as set of
regions applying a Na\"{i}ve Bayes Nearest Neighbor (NBNN) model for image
classification. As opposed to previous methods where CNNs are merely used as
feature extractors, our approach seamlessly integrates the NBNN model into a
fully-convolutional neural network. Experimental results show that the proposed
algorithm outperforms previous methods based on pre-trained CNN models and
that, when employed in challenging robot place recognition tasks, it is robust
to occlusions, environmental and sensor changes
Direction-aware Spatial Context Features for Shadow Detection
Shadow detection is a fundamental and challenging task, since it requires an
understanding of global image semantics and there are various backgrounds
around shadows. This paper presents a novel network for shadow detection by
analyzing image context in a direction-aware manner. To achieve this, we first
formulate the direction-aware attention mechanism in a spatial recurrent neural
network (RNN) by introducing attention weights when aggregating spatial context
features in the RNN. By learning these weights through training, we can recover
direction-aware spatial context (DSC) for detecting shadows. This design is
developed into the DSC module and embedded in a CNN to learn DSC features at
different levels. Moreover, a weighted cross entropy loss is designed to make
the training more effective. We employ two common shadow detection benchmark
datasets and perform various experiments to evaluate our network. Experimental
results show that our network outperforms state-of-the-art methods and achieves
97% accuracy and 38% reduction on balance error rate.Comment: Accepted for oral presentation in CVPR 2018. The journal version of
this paper is arXiv:1805.0463
Visual Semantic Re-ranker for Text Spotting
Many current state-of-the-art methods for text recognition are based on
purely local information and ignore the semantic correlation between text and
its surrounding visual context. In this paper, we propose a post-processing
approach to improve the accuracy of text spotting by using the semantic
relation between the text and the scene. We initially rely on an off-the-shelf
deep neural network that provides a series of text hypotheses for each input
image. These text hypotheses are then re-ranked using the semantic relatedness
with the object in the image. As a result of this combination, the performance
of the original network is boosted with a very low computational cost. The
proposed framework can be used as a drop-in complement for any text-spotting
algorithm that outputs a ranking of word hypotheses. We validate our approach
on ICDAR'17 shared task dataset
Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering
Many vision and language tasks require commonsense reasoning beyond
data-driven image and natural language processing. Here we adopt Visual
Question Answering (VQA) as an example task, where a system is expected to
answer a question in natural language about an image. Current state-of-the-art
systems attempted to solve the task using deep neural architectures and
achieved promising performance. However, the resulting systems are generally
opaque and they struggle in understanding questions for which extra knowledge
is required. In this paper, we present an explicit reasoning layer on top of a
set of penultimate neural network based systems. The reasoning layer enables
reasoning and answering questions where additional knowledge is required, and
at the same time provides an interpretable interface to the end users.
Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based
engine to reason over a basket of inputs: visual relations, the semantic parse
of the question, and background ontological knowledge from word2vec and
ConceptNet. Experimental analysis of the answers and the key evidential
predicates generated on the VQA dataset validate our approach.Comment: 9 pages, 3 figures, AAAI 201
Attribute-Graph: A Graph based approach to Image Ranking
We propose a novel image representation, termed Attribute-Graph, to rank
images by their semantic similarity to a given query image. An Attribute-Graph
is an undirected fully connected graph, incorporating both local and global
image characteristics. The graph nodes characterise objects as well as the
overall scene context using mid-level semantic attributes, while the edges
capture the object topology. We demonstrate the effectiveness of
Attribute-Graphs by applying them to the problem of image ranking. We benchmark
the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets,
which we have created in order to evaluate the ranking performance on complex
queries containing multiple objects. Our experimental evaluation shows that
modelling images as Attribute-Graphs results in improved ranking performance
over existing techniques.Comment: In IEEE International Conference on Computer Vision (ICCV) 201
- …