15,062 research outputs found
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
Deep Convolutional Neural Networks (DCNNs) have recently shown state of the
art performance in high level vision tasks, such as image classification and
object detection. This work brings together methods from DCNNs and
probabilistic graphical models for addressing the task of pixel-level
classification (also called "semantic image segmentation"). We show that
responses at the final layer of DCNNs are not sufficiently localized for
accurate object segmentation. This is due to the very invariance properties
that make DCNNs good for high level tasks. We overcome this poor localization
property of deep networks by combining the responses at the final DCNN layer
with a fully connected Conditional Random Field (CRF). Qualitatively, our
"DeepLab" system is able to localize segment boundaries at a level of accuracy
which is beyond previous methods. Quantitatively, our method sets the new
state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching
71.6% IOU accuracy in the test set. We show how these results can be obtained
efficiently: Careful network re-purposing and a novel application of the 'hole'
algorithm from the wavelet community allow dense computation of neural net
responses at 8 frames per second on a modern GPU.Comment: 14 pages. Updated related wor
Dual Local-Global Contextual Pathways for Recognition in Aerial Imagery
Visual context is important in object recognition and it is still an open
problem in computer vision. Along with the advent of deep convolutional neural
networks (CNN), using contextual information with such systems starts to
receive attention in the literature. At the same time, aerial imagery is
gaining momentum. While advances in deep learning make good progress in aerial
image analysis, this problem still poses many great challenges. Aerial images
are often taken under poor lighting conditions and contain low resolution
objects, many times occluded by trees or taller buildings. In this domain, in
particular, visual context could be of great help, but there are still very few
papers that consider context in aerial image understanding. Here we introduce
context as a complementary way of recognizing objects. We propose a dual-stream
deep neural network model that processes information along two independent
pathways, one for local and another for global visual reasoning. The two are
later combined in the final layers of processing. Our model learns to combine
local object appearance as well as information from the larger scene at the
same time and in a complementary way, such that together they form a powerful
classifier. We test our dual-stream network on the task of segmentation of
buildings and roads in aerial images and obtain state-of-the-art results on the
Massachusetts Buildings Dataset. We also introduce two new datasets, for
buildings and road segmentation, respectively, and study the relative
importance of local appearance vs. the larger scene, as well as their
performance in combination. While our local-global model could also be useful
in general recognition tasks, we clearly demonstrate the effectiveness of
visual context in conjunction with deep nets for aerial image understanding
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
Recent leading approaches to semantic segmentation rely on deep convolutional
networks trained with human-annotated, pixel-level segmentation masks. Such
pixel-accurate supervision demands expensive labeling effort and limits the
performance of deep networks that usually benefit from more training data. In
this paper, we propose a method that achieves competitive accuracy but only
requires easily obtained bounding box annotations. The basic idea is to iterate
between automatically generating region proposals and training convolutional
networks. These two steps gradually recover segmentation masks for improving
the networks, and vise versa. Our method, called BoxSup, produces competitive
results supervised by boxes only, on par with strong baselines fully supervised
by masks under the same setting. By leveraging a large amount of bounding
boxes, BoxSup further unleashes the power of deep convolutional networks and
yields state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT
Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries
With advanced image journaling tools, one can easily alter the semantic
meaning of an image by exploiting certain manipulation techniques such as
copy-clone, object splicing, and removal, which mislead the viewers. In
contrast, the identification of these manipulations becomes a very challenging
task as manipulated regions are not visually apparent. This paper proposes a
high-confidence manipulation localization architecture which utilizes
resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder
network to segment out manipulated regions from non-manipulated ones.
Resampling features are used to capture artifacts like JPEG quality loss,
upsampling, downsampling, rotation, and shearing. The proposed network exploits
larger receptive fields (spatial maps) and frequency domain correlation to
analyze the discriminative characteristics between manipulated and
non-manipulated regions by incorporating encoder and LSTM network. Finally,
decoder network learns the mapping from low-resolution feature maps to
pixel-wise predictions for image tamper localization. With predicted mask
provided by final layer (softmax) of the proposed architecture, end-to-end
training is performed to learn the network parameters through back-propagation
using ground-truth masks. Furthermore, a large image splicing dataset is
introduced to guide the training process. The proposed method is capable of
localizing image manipulations at pixel level with high precision, which is
demonstrated through rigorous experimentation on three diverse datasets
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to
Computer Vision and Image Processing. Some of the exciting application areas of
CNN include Image Classification and Segmentation, Object Detection, Video
Processing, Natural Language Processing, and Speech Recognition. The powerful
learning ability of deep CNN is primarily due to the use of multiple feature
extraction stages that can automatically learn representations from the data.
The availability of a large amount of data and improvement in the hardware
technology has accelerated the research in CNNs, and recently interesting deep
CNN architectures have been reported. Several inspiring ideas to bring
advancements in CNNs have been explored, such as the use of different
activation and loss functions, parameter optimization, regularization, and
architectural innovations. However, the significant improvement in the
representational capacity of the deep CNN is achieved through architectural
innovations. Notably, the ideas of exploiting spatial and channel information,
depth and width of architecture, and multi-path information processing have
gained substantial attention. Similarly, the idea of using a block of layers as
a structural unit is also gaining popularity. This survey thus focuses on the
intrinsic taxonomy present in the recently reported deep CNN architectures and,
consequently, classifies the recent innovations in CNN architectures into seven
different categories. These seven categories are based on spatial exploitation,
depth, multi-path, width, feature-map exploitation, channel boosting, and
attention. Additionally, the elementary understanding of CNN components,
current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11.
Artif Intell Rev (2020
The Biological Concept of Neoteny in Evolutionary Colour Image Segmentation - Simple Experiments in Simple Non-Memetic Genetic Algorithms
Neoteny, also spelled Paedomorphosis, can be defined in biological terms as
the retention by an organism of juvenile or even larval traits into later life.
In some species, all morphological development is retarded; the organism is
juvenilized but sexually mature. Such shifts of reproductive capability would
appear to have adaptive significance to organisms that exhibit it. In terms of
evolutionary theory, the process of paedomorphosis suggests that larval stages
and developmental phases of existing organisms may give rise, under certain
circumstances, to wholly new organisms. Although the present work does not
pretend to model or simulate the biological details of such a concept in any
way, these ideas were incorporated by a rather simple abstract computational
strategy, in order to allow (if possible) for faster convergence into simple
non-memetic Genetic Algorithms, i.e. without using local improvement procedures
(e.g. via Baldwin or Lamarckian learning). As a case-study, the Genetic
Algorithm was used for colour image segmentation purposes by using K-mean
unsupervised clustering methods, namely for guiding the evolutionary algorithm
in his search for finding the optimal or sub-optimal data partition. Average
results suggest that the use of neotonic strategies by employing juvenile
genotypes into the later generations and the use of linear-dynamic mutation
rates instead of constant, can increase fitness values by 58% comparing to
classical Genetic Algorithms, independently from the starting population
characteristics on the search space. KEYWORDS: Genetic Algorithms, Artificial
Neoteny, Dynamic Mutation Rates, Faster Convergence, Colour Image Segmentation,
Classification, Clustering.Comment: 12 pages, 3 figures, at
http://alfa.ist.utl.pt/~cvrm/staff/vramos/ref_35.htm
Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation
Deep convolutional neural networks (DCNNs) trained on a large number of
images with strong pixel-level annotations have recently significantly pushed
the state-of-art in semantic image segmentation. We study the more challenging
problem of learning DCNNs for semantic image segmentation from either (1)
weakly annotated training data such as bounding boxes or image-level labels or
(2) a combination of few strongly labeled and many weakly labeled images,
sourced from one or multiple datasets. We develop Expectation-Maximization (EM)
methods for semantic image segmentation model training under these weakly
supervised and semi-supervised settings. Extensive experimental evaluation
shows that the proposed techniques can learn models delivering competitive
results on the challenging PASCAL VOC 2012 image segmentation benchmark, while
requiring significantly less annotation effort. We share source code
implementing the proposed system at
https://bitbucket.org/deeplab/deeplab-public.Comment: Accepted to ICCV 201
Incorporating Network Built-in Priors in Weakly-supervised Semantic Segmentation
Pixel-level annotations are expensive and time consuming to obtain. Hence,
weak supervision using only image tags could have a significant impact in
semantic segmentation. Recently, CNN-based methods have proposed to fine-tune
pre-trained networks using image tags. Without additional information, this
leads to poor localization accuracy. This problem, however, was alleviated by
making use of objectness priors to generate foreground/background masks.
Unfortunately these priors either require pixel-level annotations/bounding
boxes, or still yield inaccurate object boundaries. Here, we propose a novel
method to extract accurate masks from networks pre-trained for the task of
object recognition, thus forgoing external objectness modules. We first show
how foreground/background masks can be obtained from the activations of
higher-level convolutional layers of a network. We then show how to obtain
multi-class masks by the fusion of foreground/background ones with information
extracted from a weakly-supervised localization network. Our experiments
evidence that exploiting these masks in conjunction with a weakly-supervised
training loss yields state-of-the-art tag-based weakly-supervised semantic
segmentation results.Comment: 14 pages, 11 figures, 8 tables, Accepted in IEEE Transaction on
Pattern Analysis and Machine Intelligence (IEEE TPAMI
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Salient Object Detection in Video using Deep Non-Local Neural Networks
Detection of salient objects in image and video is of great importance in
many computer vision applications. In spite of the fact that the state of the
art in saliency detection for still images has been changed substantially over
the last few years, there have been few improvements in video saliency
detection. This paper investigates the use of recently introduced non-local
neural networks in video salient object detection. Non-local neural networks
are applied to capture global dependencies and hence determine the salient
objects. The effect of non-local operations is studied separately on static and
dynamic saliency detection in order to exploit both appearance and motion
features. A novel deep non-local neural network architecture is introduced for
video salient object detection and tested on two well-known datasets DAVIS and
FBMS. The experimental results show that the proposed algorithm outperforms
state-of-the-art video saliency detection methods.Comment: Submitted to Journal of Visual Communication and Image Representatio
- …