16,360 research outputs found
Memory-Efficient Deep Salient Object Segmentation Networks on Gridized Superpixels
Computer vision algorithms with pixel-wise labeling tasks, such as semantic
segmentation and salient object detection, have gone through a significant
accuracy increase with the incorporation of deep learning. Deep segmentation
methods slightly modify and fine-tune pre-trained networks that have hundreds
of millions of parameters. In this work, we question the need to have such
memory demanding networks for the specific task of salient object segmentation.
To this end, we propose a way to learn a memory-efficient network from scratch
by training it only on salient object detection datasets. Our method encodes
images to gridized superpixels that preserve both the object boundaries and the
connectivity rules of regular pixels. This representation allows us to use
convolutional neural networks that operate on regular grids. By using these
encoded images, we train a memory-efficient network using only 0.048\% of the
number of parameters that other deep salient object detection networks have.
Our method shows comparable accuracy with the state-of-the-art deep salient
object detection methods and provides a faster and a much more memory-efficient
alternative to them. Due to its easy deployment, such a network is preferable
for applications in memory limited devices such as mobile phones and IoT
devices.Comment: 6 pages, submitted to MMSP 201
Semantic Perceptual Image Compression using Deep Convolution Networks
It has long been considered a significant problem to improve the visual
quality of lossy image and video compression. Recent advances in computing
power together with the availability of large training data sets has increased
interest in the application of deep learning cnns to address image recognition
and image processing tasks. Here, we present a powerful cnn tailored to the
specific task of semantic image understanding to achieve higher visual quality
in lossy compression. A modest increase in complexity is incorporated to the
encoder which allows a standard, off-the-shelf jpeg decoder to be used. While
jpeg encoding may be optimized for generic images, the process is ultimately
unaware of the specific content of the image to be compressed. Our technique
makes jpeg content-aware by designing and training a model to identify multiple
semantic regions in a given image. Unlike object detection techniques, our
model does not require labeling of object positions and is able to identify
objects in a single pass. We present a new cnn architecture directed
specifically to image compression, which generates a map that highlights
semantically-salient regions so that they can be encoded at higher quality as
compared to background regions. By adding a complete set of features for every
class, and then taking a threshold over the sum of all feature activations, we
generate a map that highlights semantically-salient regions so that they can be
encoded at a better quality compared to background regions. Experiments are
presented on the Kodak PhotoCD dataset and the MIT Saliency Benchmark dataset,
in which our algorithm achieves higher visual quality for the same compressed
size.Comment: Accepted to Data Compression Conference, 11 pages, 5 figure
Deep Saliency with Encoded Low level Distance Map and High Level Features
Recent advances in saliency detection have utilized deep learning to obtain
high level features to detect salient regions in a scene. These advances have
demonstrated superior results over previous works that utilize hand-crafted low
level features for saliency detection. In this paper, we demonstrate that
hand-crafted features can provide complementary information to enhance
performance of saliency detection that utilizes only high level features. Our
method utilizes both high level and low level features for saliency detection
under a unified deep learning framework. The high level features are extracted
using the VGG-net, and the low level features are compared with other parts of
an image to form a low level distance map. The low level distance map is then
encoded using a convolutional neural network(CNN) with multiple 1X1
convolutional and ReLU layers. We concatenate the encoded low level distance
map and the high level features, and connect them to a fully connected neural
network classifier to evaluate the saliency of a query region. Our experiments
show that our method can further improve the performance of state-of-the-art
deep learning-based saliency detection methods.Comment: Accepted by IEEE Conference on Computer Vision and Pattern
Recognition(CVPR) 2016. Project page:
https://github.com/gylee1103/SaliencyEL
Automated Visual Fin Identification of Individual Great White Sharks
This paper discusses the automated visual identification of individual great
white sharks from dorsal fin imagery. We propose a computer vision photo ID
system and report recognition results over a database of thousands of
unconstrained fin images. To the best of our knowledge this line of work
establishes the first fully automated contour-based visual ID system in the
field of animal biometrics. The approach put forward appreciates shark fins as
textureless, flexible and partially occluded objects with an individually
characteristic shape. In order to recover animal identities from an image we
first introduce an open contour stroke model, which extends multi-scale region
segmentation to achieve robust fin detection. Secondly, we show that
combinatorial, scale-space selective fingerprinting can successfully encode fin
individuality. We then measure the species-specific distribution of visual
individuality along the fin contour via an embedding into a global `fin space'.
Exploiting this domain, we finally propose a non-linear model for individual
animal recognition and combine all approaches into a fine-grained
multi-instance framework. We provide a system evaluation, compare results to
prior work, and report performance and properties in detail.Comment: 17 pages, 16 figures. To be published in IJCV. Article replaced to
update first author contact details and to correct a Figure reference on page
Subitizing with Variational Autoencoders
Numerosity, the number of objects in a set, is a basic property of a given
visual scene. Many animals develop the perceptual ability to subitize: the
near-instantaneous identification of the numerosity in small sets of visual
items. In computer vision, it has been shown that numerosity emerges as a
statistical property in neural networks during unsupervised learning from
simple synthetic images. In this work, we focus on more complex natural images
using unsupervised hierarchical neural networks. Specifically, we show that
variational autoencoders are able to spontaneously perform subitizing after
training without supervision on a large amount images from the Salient Object
Subitizing dataset. While our method is unable to outperform supervised
convolutional networks for subitizing, we observe that the networks learn to
encode numerosity as basic visual property. Moreover, we find that the learned
representations are likely invariant to object area; an observation in
alignment with studies on biological neural networks in cognitive neuroscience
Salient object subitizing
We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA. (0910908 - US NSF; 1029430 - US NSF)https://arxiv.org/abs/1607.07525https://arxiv.org/pdf/1607.07525.pdfAccepted manuscrip
- …