4,442 research outputs found
Adaptive Affinity Fields for Semantic Segmentation
Semantic segmentation has made much progress with increasingly powerful
pixel-wise classifiers and incorporating structural priors via Conditional
Random Fields (CRF) or Generative Adversarial Networks (GAN). We propose a
simpler alternative that learns to verify the spatial structure of segmentation
during training only. Unlike existing approaches that enforce semantic labels
on individual pixels and match labels between neighbouring pixels, we propose
the concept of Adaptive Affinity Fields (AAF) to capture and match the semantic
relations between neighbouring pixels in the label space. We use adversarial
learning to select the optimal affinity field size for each semantic category.
It is formulated as a minimax problem, optimizing our segmentation neural
network in a best worst-case learning scenario. AAF is versatile for
representing structures as a collection of pixel-centric relations, easier to
train than GAN and more efficient than CRF without run-time inference. Our
extensive evaluations on PASCAL VOC 2012, Cityscapes, and GTA5 datasets
demonstrate its above-par segmentation performance and robust generalization
across domains.Comment: To appear in European Conference on Computer Vision (ECCV) 201
CaseNet: Content-Adaptive Scale Interaction Networks for Scene Parsing
Objects in an image exhibit diverse scales. Adaptive receptive fields are
expected to catch suitable range of context for accurate pixel level semantic
prediction for handling objects of diverse sizes. Recently, atrous convolution
with different dilation rates has been used to generate features of
multi-scales through several branches and these features are fused for
prediction. However, there is a lack of explicit interaction among the branches
to adaptively make full use of the contexts. In this paper, we propose a
Content-Adaptive Scale Interaction Network (CaseNet) to exploit the multi-scale
features for scene parsing. We build the CaseNet based on the classic Atrous
Spatial Pyramid Pooling (ASPP) module, followed by the proposed contextual
scale interaction (CSI) module, and the scale adaptation (SA) module.
Specifically, first, for each spatial position, we enable context interaction
among different scales through scale-aware non-local operations across the
scales, \ie, CSI module, which facilitates the generation of flexible mixed
receptive fields, instead of a traditional flat one. Second, the scale
adaptation module (SA) explicitly and softly selects the suitable scale for
each spatial position and each channel. Ablation studies demonstrate the
effectiveness of the proposed modules. We achieve state-of-the-art performance
on three scene parsing benchmarks Cityscapes, ADE20K and LIP
SEMEDA: Enhancing Segmentation Precision with Semantic Edge Aware Loss
While nowadays deep neural networks achieve impressive performances on
semantic segmentation tasks, they are usually trained by optimizing pixel-wise
losses such as cross-entropy. As a result, the predictions outputted by such
networks usually struggle to accurately capture the object boundaries and
exhibit holes inside the objects. In this paper, we propose a novel approach to
improve the structure of the predicted segmentation masks. We introduce a novel
semantic edge detection network, which allows to match the predicted and ground
truth segmentation masks. This Semantic Edge-Aware strategy (SEMEDA) can be
combined with any backbone deep network in an end-to-end training framework.
Through thorough experimental validation on Pascal VOC 2012 and Cityscapes
datasets, we show that the proposed SEMEDA approach enhances the structure of
the predicted segmentation masks by enforcing sharp boundaries and avoiding
discontinuities inside objects, improving the segmentation performance. In
addition, our semantic edge-aware loss can be integrated into any popular
segmentation network without requiring any additional annotation and with
negligible computational load, as compared to standard pixel-wise cross-entropy
loss
Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation
Semantic segmentation has achieved huge progress via adopting deep Fully
Convolutional Networks (FCN). However, the performance of FCN based models
severely rely on the amounts of pixel-level annotations which are expensive and
time-consuming. To address this problem, it is a good choice to learn to
segment with weak supervision from bounding boxes. How to make full use of the
class-level and region-level supervisions from bounding boxes is the critical
challenge for the weakly supervised learning task. In this paper, we first
introduce a box-driven class-wise masking model (BCM) to remove irrelevant
regions of each class. Moreover, based on the pixel-level segment proposal
generated from the bounding box supervision, we could calculate the mean
filling rates of each class to serve as an important prior cue, then we propose
a filling rate guided adaptive loss (FR-Loss) to help the model ignore the
wrongly labeled pixels in proposals. Unlike previous methods directly training
models with the fixed individual segment proposals, our method can adjust the
model learning with global statistical information. Thus it can help reduce the
negative impacts from wrongly labeled proposals. We evaluate the proposed
method on the challenging PASCAL VOC 2012 benchmark and compare with other
methods. Extensive experimental results show that the proposed method is
effective and achieves the state-of-the-art results.Comment: Accepted by CVPR 201
Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation
Image segmentation refers to the process to divide an image into
nonoverlapping meaningful regions according to human perception, which has
become a classic topic since the early ages of computer vision. A lot of
research has been conducted and has resulted in many applications. However,
while many segmentation algorithms exist, yet there are only a few sparse and
outdated summarizations available, an overview of the recent achievements and
issues is lacking. We aim to provide a comprehensive review of the recent
progress in this field. Covering 180 publications, we give an overview of broad
areas of segmentation topics including not only the classic bottom-up
approaches, but also the recent development in superpixel, interactive methods,
object proposals, semantic image parsing and image cosegmentation. In addition,
we also review the existing influential datasets and evaluation metrics.
Finally, we suggest some design flavors and research directions for future
research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image
Representatio
Scene Parsing with Global Context Embedding
We present a scene parsing method that utilizes global context information
based on both the parametric and non- parametric models. Compared to previous
methods that only exploit the local relationship between objects, we train a
context network based on scene similarities to generate feature representations
for global contexts. In addition, these learned features are utilized to
generate global and spatial priors for explicit classes inference. We then
design modules to embed the feature representations and the priors into the
segmentation network as additional global context cues. We show that the
proposed method can eliminate false positives that are not compatible with the
global context representations. Experiments on both the MIT ADE20K and PASCAL
Context datasets show that the proposed method performs favorably against
existing methods.Comment: Accepted in ICCV'17. Code available at
https://github.com/hfslyc/GCPNe
Texture Fuzzy Segmentation using Skew Divergence Adaptive Affinity Functions
Digital image segmentation is the process of assigning distinct labels to
different objects in a digital image, and the fuzzy segmentation algorithm has
been successfully used in the segmentation of images from a wide variety of
sources. However, the traditional fuzzy segmentation algorithm fails to segment
objects that are characterized by textures whose patterns cannot be
successfully described by simple statistics computed over a very restricted
area. In this paper, we propose an extension of the fuzzy segmentation
algorithm that uses adaptive textural affinity functions to perform the
segmentation of such objects on bidimensional images. The adaptive affinity
functions compute their appropriate neighborhood size as they compute the
texture descriptors surrounding the seed spels (spatial elements), according to
the characteristics of the texture being processed. The algorithm then segments
the image with an appropriate neighborhood for each object. We performed
experiments on mosaic images that were composed using images from the Brodatz
database, and compared our results with the ones produced by a recently
published texture segmentation algorithm, showing the applicability of our
method
Branched Multi-Task Networks: Deciding What Layers To Share
In the context of multi-task learning, neural networks with branched
architectures have often been employed to jointly tackle the tasks at hand.
Such ramified networks typically start with a number of shared layers, after
which different tasks branch out into their own sequence of layers.
Understandably, as the number of possible network configurations is
combinatorially large, deciding what layers to share and where to branch out
becomes cumbersome. Prior works have either relied on ad hoc methods to
determine the level of layer sharing, which is suboptimal, or utilized neural
architecture search techniques to establish the network design, which is
considerably expensive. In this paper, we go beyond these limitations and
propose an approach to automatically construct branched multi-task networks, by
leveraging the employed tasks' affinities. Given a specific budget, i.e. number
of learnable parameters, the proposed approach generates architectures, in
which shallow layers are task-agnostic, whereas deeper ones gradually grow more
task-specific. Extensive experimental analysis across numerous, diverse
multi-tasking datasets shows that, for a given budget, our method consistently
yields networks with the highest performance, while for a certain performance
threshold it requires the least amount of learnable parameters.Comment: Accepted at BMVC 202
Manifold Alignment for Semantically Aligned Style Transfer
Given a content image and a style image, the goal of style transfer is to
synthesize an output image by transferring the target style to the content
image. Currently, most of the methods address the problem with global style
transfer, assuming styles can be represented by global statistics, such as Gram
matrices or covariance matrices. In this paper, we make a different assumption
that local semantically aligned (or similar) regions between the content and
style images should share similar style patterns. Based on this assumption,
content features and style features are seen as two sets of manifolds and a
manifold alignment based style transfer (MAST) method is proposed. MAST is a
subspace learning method which learns a common subspace of the content and
style features. In the common subspace, content and style features with larger
feature similarity or the same semantic meaning are forced to be close. The
learned projection matrices are added with orthogonality constraints so that
the mapping can be bidirectional, which allows us to project the content
features into the common subspace, and then into the original style space. By
using a pre-trained decoder, promising stylized images are obtained. The method
is further extended to allow users to specify corresponding semantic regions
between content and style images or using semantic segmentation maps as
guidance. Extensive experiments show the proposed MAST achieves appealing
results in style transfer.Comment: 10 page
Pixel-Adaptive Convolutional Neural Networks
Convolutions are the fundamental building block of CNNs. The fact that their
weights are spatially shared is one of the main reasons for their widespread
use, but it also is a major limitation, as it makes convolutions content
agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet
effective modification of standard convolutions, in which the filter weights
are multiplied with a spatially-varying kernel that depends on learnable, local
pixel features. PAC is a generalization of several popular filtering techniques
and thus can be used for a wide range of use cases. Specifically, we
demonstrate state-of-the-art performance when PAC is used for deep joint image
upsampling. PAC also offers an effective alternative to fully-connected CRF
(Full-CRF), called PAC-CRF, which performs competitively, while being
considerably faster. In addition, we also demonstrate that PAC can be used as a
drop-in replacement for convolution layers in pre-trained networks, resulting
in consistent performance improvements.Comment: CVPR 2019. Video introduction: https://youtu.be/gsQZbHuR64
- …