1,483 research outputs found
Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN
Discriminative localization is essential for fine-grained image
classification task, which devotes to recognizing hundreds of subcategories in
the same basic-level category. Reflecting on discriminative regions of objects,
key differences among different subcategories are subtle and local. Existing
methods generally adopt a two-stage learning framework: The first stage is to
localize the discriminative regions of objects, and the second is to encode the
discriminative features for training classifiers. However, these methods
generally have two limitations: (1) Separation of the two-stage learning is
time-consuming. (2) Dependence on object and parts annotations for
discriminative localization learning leads to heavily labor-consuming labeling.
It is highly challenging to address these two important limitations
simultaneously. Existing methods only focus on one of them. Therefore, this
paper proposes the discriminative localization approach via saliency-guided
Faster R-CNN to address the above two limitations at the same time, and our
main novelties and advantages are: (1) End-to-end network based on Faster R-CNN
is designed to simultaneously localize discriminative regions and encode
discriminative features, which accelerates classification speed. (2)
Saliency-guided localization learning is proposed to localize the
discriminative region automatically, avoiding labor-consuming labeling. Both
are jointly employed to simultaneously accelerate classification speed and
eliminate dependence on object and parts annotations. Comparing with the
state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach
achieves both the best classification accuracy and efficiency.Comment: 9 pages, to appear in ACM MM 201
Learning Visual Importance for Graphic Designs and Data Visualizations
Knowing where people look and click on visual designs can provide clues about
how the designs are perceived, and where the most important or relevant content
lies. The most important content of a visual design can be used for effective
summarization or to facilitate retrieval from a database. We present automated
models that predict the relative importance of different elements in data
visualizations and graphic designs. Our models are neural networks trained on
human clicks and importance annotations on hundreds of designs. We collected a
new dataset of crowdsourced importance, and analyzed the predictions of our
models with respect to ground truth importance and human eye movements. We
demonstrate how such predictions of importance can be used for automatic design
retargeting and thumbnailing. User studies with hundreds of MTurk participants
validate that, with limited post-processing, our importance-driven applications
are on par with, or outperform, current state-of-the-art methods, including
natural image saliency. We also provide a demonstration of how our importance
predictions can be built into interactive design tools to offer immediate
feedback during the design process
Memory-Efficient Deep Salient Object Segmentation Networks on Gridized Superpixels
Computer vision algorithms with pixel-wise labeling tasks, such as semantic
segmentation and salient object detection, have gone through a significant
accuracy increase with the incorporation of deep learning. Deep segmentation
methods slightly modify and fine-tune pre-trained networks that have hundreds
of millions of parameters. In this work, we question the need to have such
memory demanding networks for the specific task of salient object segmentation.
To this end, we propose a way to learn a memory-efficient network from scratch
by training it only on salient object detection datasets. Our method encodes
images to gridized superpixels that preserve both the object boundaries and the
connectivity rules of regular pixels. This representation allows us to use
convolutional neural networks that operate on regular grids. By using these
encoded images, we train a memory-efficient network using only 0.048\% of the
number of parameters that other deep salient object detection networks have.
Our method shows comparable accuracy with the state-of-the-art deep salient
object detection methods and provides a faster and a much more memory-efficient
alternative to them. Due to its easy deployment, such a network is preferable
for applications in memory limited devices such as mobile phones and IoT
devices.Comment: 6 pages, submitted to MMSP 201
Semantic Perceptual Image Compression using Deep Convolution Networks
It has long been considered a significant problem to improve the visual
quality of lossy image and video compression. Recent advances in computing
power together with the availability of large training data sets has increased
interest in the application of deep learning cnns to address image recognition
and image processing tasks. Here, we present a powerful cnn tailored to the
specific task of semantic image understanding to achieve higher visual quality
in lossy compression. A modest increase in complexity is incorporated to the
encoder which allows a standard, off-the-shelf jpeg decoder to be used. While
jpeg encoding may be optimized for generic images, the process is ultimately
unaware of the specific content of the image to be compressed. Our technique
makes jpeg content-aware by designing and training a model to identify multiple
semantic regions in a given image. Unlike object detection techniques, our
model does not require labeling of object positions and is able to identify
objects in a single pass. We present a new cnn architecture directed
specifically to image compression, which generates a map that highlights
semantically-salient regions so that they can be encoded at higher quality as
compared to background regions. By adding a complete set of features for every
class, and then taking a threshold over the sum of all feature activations, we
generate a map that highlights semantically-salient regions so that they can be
encoded at a better quality compared to background regions. Experiments are
presented on the Kodak PhotoCD dataset and the MIT Saliency Benchmark dataset,
in which our algorithm achieves higher visual quality for the same compressed
size.Comment: Accepted to Data Compression Conference, 11 pages, 5 figure
Part Detector Discovery in Deep Convolutional Neural Networks
Current fine-grained classification approaches often rely on a robust
localization of object parts to extract localized feature representations
suitable for discrimination. However, part localization is a challenging task
due to the large variation of appearance and pose. In this paper, we show how
pre-trained convolutional neural networks can be used for robust and efficient
object part discovery and localization without the necessity to actually train
the network on the current dataset. Our approach called "part detector
discovery" (PDD) is based on analyzing the gradient maps of the network outputs
and finding activation centers spatially related to annotated semantic parts or
bounding boxes.
This allows us not just to obtain excellent performance on the CUB200-2011
dataset, but in contrast to previous approaches also to perform detection and
bird classification jointly without requiring a given bounding box annotation
during testing and ground-truth parts during training. The code is available at
http://www.inf-cv.uni-jena.de/part_discovery and
https://github.com/cvjena/PartDetectorDisovery.Comment: Accepted for publication on Asian Conference on Computer Vision
(ACCV) 201
A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
Traditional architectures for solving computer vision problems and the degree
of success they enjoyed have been heavily reliant on hand-crafted features.
However, of late, deep learning techniques have offered a compelling
alternative -- that of automatically learning problem-specific features. With
this new paradigm, every problem in computer vision is now being re-examined
from a deep learning perspective. Therefore, it has become important to
understand what kind of deep networks are suitable for a given problem.
Although general surveys of this fast-moving paradigm (i.e. deep-networks)
exist, a survey specific to computer vision is missing. We specifically
consider one form of deep networks widely used in computer vision -
convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN
and then examine the broad variations proposed over time to suit different
applications. We hope that our recipe-style survey will serve as a guide,
particularly for novice practitioners intending to use deep-learning techniques
for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm
- …