8,321 research outputs found
Object Segmentation using Pixel-wise Adversarial Loss
Recent deep learning based approaches have shown remarkable success on object
segmentation tasks. However, there is still room for further improvement.
Inspired by generative adversarial networks, we present a generic end-to-end
adversarial approach, which can be combined with a wide range of existing
semantic segmentation networks to improve their segmentation performance. The
key element of our method is to replace the commonly used binary adversarial
loss with a high resolution pixel-wise loss. In addition, we train our
generator employing stochastic weight averaging fashion, which further enhances
the predicted output label maps leading to state-of-the-art results. We show,
that this combination of pixel-wise adversarial training and weight averaging
leads to significant and consistent gains in segmentation performance, compared
to the baseline models
Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos
Human behavior understanding in videos is a complex, still unsolved problem
and requires to accurately model motion at both the local (pixel-wise dense
prediction) and global (aggregation of motion cues) levels. Current approaches
based on supervised learning require large amounts of annotated data, whose
scarce availability is one of the main limiting factors to the development of
general solutions. Unsupervised learning can instead leverage the vast amount
of videos available on the web and it is a promising solution for overcoming
the existing limitations. In this paper, we propose an adversarial GAN-based
framework that learns video representations and dynamics through a
self-supervision mechanism in order to perform dense and global prediction in
videos. Our approach synthesizes videos by 1) factorizing the process into the
generation of static visual content and motion, 2) learning a suitable
representation of a motion latent space in order to enforce spatio-temporal
coherency of object trajectories, and 3) incorporating motion estimation and
pixel-wise dense prediction into the training procedure. Self-supervision is
enforced by using motion masks produced by the generator, as a co-product of
its generation process, to supervise the discriminator network in performing
dense prediction. Performance evaluation, carried out on standard benchmarks,
shows that our approach is able to learn, in an unsupervised way, both local
and global video dynamics. The learned representations, then, support the
training of video object segmentation methods with sensibly less (about 50%)
annotations, giving performance comparable to the state of the art.
Furthermore, the proposed method achieves promising performance in generating
realistic videos, outperforming state-of-the-art approaches especially on
motion-related metrics
Adaptive Affinity Fields for Semantic Segmentation
Semantic segmentation has made much progress with increasingly powerful
pixel-wise classifiers and incorporating structural priors via Conditional
Random Fields (CRF) or Generative Adversarial Networks (GAN). We propose a
simpler alternative that learns to verify the spatial structure of segmentation
during training only. Unlike existing approaches that enforce semantic labels
on individual pixels and match labels between neighbouring pixels, we propose
the concept of Adaptive Affinity Fields (AAF) to capture and match the semantic
relations between neighbouring pixels in the label space. We use adversarial
learning to select the optimal affinity field size for each semantic category.
It is formulated as a minimax problem, optimizing our segmentation neural
network in a best worst-case learning scenario. AAF is versatile for
representing structures as a collection of pixel-centric relations, easier to
train than GAN and more efficient than CRF without run-time inference. Our
extensive evaluations on PASCAL VOC 2012, Cityscapes, and GTA5 datasets
demonstrate its above-par segmentation performance and robust generalization
across domains.Comment: To appear in European Conference on Computer Vision (ECCV) 201
No More Discrimination: Cross City Adaptation of Road Scene Segmenters
Despite the recent success of deep-learning based semantic segmentation,
deploying a pre-trained road scene segmenter to a city whose images are not
presented in the training set would not achieve satisfactory performance due to
dataset biases. Instead of collecting a large number of annotated images of
each city of interest to train or refine the segmenter, we propose an
unsupervised learning approach to adapt road scene segmenters across different
cities. By utilizing Google Street View and its time-machine feature, we can
collect unannotated images for each road scene at different times, so that the
associated static-object priors can be extracted accordingly. By advancing a
joint global and class-specific domain adversarial learning framework,
adaptation of pre-trained segmenters to that city can be achieved without the
need of any user annotation or interaction. We show that our method improves
the performance of semantic segmentation in multiple cities across continents,
while it performs favorably against state-of-the-art approaches requiring
annotated training data.Comment: 13 pages, 10 figure
ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
Semantic segmentation is a key problem for many computer vision tasks. While
approaches based on convolutional neural networks constantly break new records
on different benchmarks, generalizing well to diverse testing environments
remains a major challenge. In numerous real world applications, there is indeed
a large gap between data distributions in train and test domains, which results
in severe performance loss at run-time. In this work, we address the task of
unsupervised domain adaptation in semantic segmentation with losses based on
the entropy of the pixel-wise predictions. To this end, we propose two novel,
complementary methods using (i) entropy loss and (ii) adversarial loss
respectively. We demonstrate state-of-the-art performance in semantic
segmentation on two challenging "synthetic-2-real" set-ups and show that the
approach can also be used for detection.Comment: Accepted in CVPR'19. Code is available at
https://github.com/valeoai/ADVEN
Weakly Supervised Adversarial Domain Adaptation for Semantic Segmentation in Urban Scenes
Semantic segmentation, a pixel-level vision task, is developed rapidly by
using convolutional neural networks (CNNs). Training CNNs requires a large
amount of labeled data, but manually annotating data is difficult. For
emancipating manpower, in recent years, some synthetic datasets are released.
However, they are still different from real scenes, which causes that training
a model on the synthetic data (source domain) cannot achieve a good performance
on real urban scenes (target domain). In this paper, we propose a weakly
supervised adversarial domain adaptation to improve the segmentation
performance from synthetic data to real scenes, which consists of three deep
neural networks. To be specific, a detection and segmentation ("DS" for short)
model focuses on detecting objects and predicting segmentation map; a
pixel-level domain classifier ("PDC" for short) tries to distinguish the image
features from which domains; an object-level domain classifier ("ODC" for
short) discriminates the objects from which domains and predicts the objects
classes. PDC and ODC are treated as the discriminators, and DS is considered as
the generator. By adversarial learning, DS is supposed to learn
domain-invariant features. In experiments, our proposed method yields the new
record of mIoU metric in the same problem.Comment: To appear at TI
Salient Object Detection in the Deep Learning Era: An In-Depth Survey
As an essential problem in computer vision, salient object detection (SOD)
has attracted an increasing amount of research attention over the years. Recent
advances in SOD are predominantly led by deep learning-based solutions (named
deep SOD). To enable in-depth understanding of deep SOD, in this paper, we
provide a comprehensive survey covering various aspects, ranging from algorithm
taxonomy to unsolved issues. In particular, we first review deep SOD algorithms
from different perspectives, including network architecture, level of
supervision, learning paradigm, and object-/instance-level detection. Following
that, we summarize and analyze existing SOD datasets and evaluation metrics.
Then, we benchmark a large group of representative SOD models, and provide
detailed analyses of the comparison results. Moreover, we study the performance
of SOD algorithms under different attribute settings, which has not been
thoroughly explored previously, by constructing a novel SOD dataset with rich
attribute annotations covering various salient object types, challenging
factors, and scene categories. We further analyze, for the first time in the
field, the robustness of SOD models to random input perturbations and
adversarial attacks. We also look into the generalization and difficulty of
existing SOD datasets. Finally, we discuss several open issues of SOD and
outline future research directions.Comment: Published on IEEE TPAMI. All the saliency prediction maps, our
constructed dataset with annotations, and codes for evaluation are publicly
available at \url{https://github.com/wenguanwang/SODsurvey
Object Discovery with a Copy-Pasting GAN
We tackle the problem of object discovery, where objects are segmented for a
given input image, and the system is trained without using any direct
supervision whatsoever. A novel copy-pasting GAN framework is proposed, where
the generator learns to discover an object in one image by compositing it into
another image such that the discriminator cannot tell that the resulting image
is fake. After carefully addressing subtle issues, such as preventing the
generator from `cheating', this game results in the generator learning to
select objects, as copy-pasting objects is most likely to fool the
discriminator. The system is shown to work well on four very different
datasets, including large object appearance variations in challenging cluttered
backgrounds
Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach
We investigate a principle way to progressively mine discriminative object
regions using classification networks to address the weakly-supervised semantic
segmentation problems. Classification networks are only responsive to small and
sparse discriminative regions from the object of interest, which deviates from
the requirement of the segmentation task that needs to localize dense, interior
and integral regions for pixel-wise inference. To mitigate this gap, we propose
a new adversarial erasing approach for localizing and expanding object regions
progressively. Starting with a single small object region, our proposed
approach drives the classification network to sequentially discover new and
complement object regions by erasing the current mined regions in an
adversarial manner. These localized regions eventually constitute a dense and
complete object region for learning semantic segmentation. To further enhance
the quality of the discovered regions by adversarial erasing, an online
prohibitive segmentation learning approach is developed to collaborate with
adversarial erasing by providing auxiliary segmentation supervision modulated
by the more reliable classification scores. Despite its apparent simplicity,
the proposed approach achieves 55.0% and 55.7% mean Intersection-over-Union
(mIoU) scores on PASCAL VOC 2012 val and test sets, which are the new
state-of-the-arts.Comment: Accepted to appear in CVPR 2017 (oral
Universal Semi-Supervised Semantic Segmentation
In recent years, the need for semantic segmentation has arisen across several
different applications and environments. However, the expense and redundancy of
annotation often limits the quantity of labels available for training in any
domain, while deployment is easier if a single model works well across domains.
In this paper, we pose the novel problem of universal semi-supervised semantic
segmentation and propose a solution framework, to meet the dual needs of lower
annotation and deployment costs. In contrast to counterpoints such as fine
tuning, joint training or unsupervised domain adaptation, universal
semi-supervised segmentation ensures that across all domains: (i) a single
model is deployed, (ii) unlabeled data is used, (iii) performance is improved,
(iv) only a few labels are needed and (v) label spaces may differ. To address
this, we minimize supervised as well as within and cross-domain unsupervised
losses, introducing a novel feature alignment objective based on pixel-aware
entropy regularization for the latter. We demonstrate quantitative advantages
over other approaches on several combinations of segmentation datasets across
different geographies (Germany, England, India) and environments (outdoors,
indoors), as well as qualitative insights on the aligned representations.Comment: Accepted as poster presentation at ICCV 201
- …