678 research outputs found
GFF: Gated Fully Fusion for Semantic Segmentation
Semantic segmentation generates comprehensive understanding of scenes through
densely predicting the category for each pixel. High-level features from Deep
Convolutional Neural Networks already demonstrate their effectiveness in
semantic segmentation tasks, however the coarse resolution of high-level
features often leads to inferior results for small/thin objects where detailed
information is important. It is natural to consider importing low level
features to compensate for the lost detailed information in high-level
features.Unfortunately, simply combining multi-level features suffers from the
semantic gap among them. In this paper, we propose a new architecture, named
Gated Fully Fusion (GFF), to selectively fuse features from multiple levels
using gates in a fully connected way. Specifically, features at each level are
enhanced by higher-level features with stronger semantics and lower-level
features with more details, and gates are used to control the propagation of
useful information which significantly reduces the noises during fusion. We
achieve the state of the art results on four challenging scene parsing datasets
including Cityscapes, Pascal Context, COCO-stuff and ADE20K.Comment: accepted by AAAI-2020(oral
Learning to Act Properly: Predicting and Explaining Affordances from Images
We address the problem of affordance reasoning in diverse scenes that appear
in the real world. Affordances relate the agent's actions to their effects when
taken on the surrounding objects. In our work, we take the egocentric view of
the scene, and aim to reason about action-object affordances that respect both
the physical world as well as the social norms imposed by the society. We also
aim to teach artificial agents why some actions should not be taken in certain
situations, and what would likely happen if these actions would be taken. We
collect a new dataset that builds upon ADE20k, referred to as ADE-Affordance,
which contains annotations enabling such rich visual reasoning. We propose a
model that exploits Graph Neural Networks to propagate contextual information
from the scene in order to perform detailed affordance reasoning about each
object. Our model is showcased through various ablation studies, pointing to
successes and challenges in this complex task
Panoptic Segmentation
We propose and study a task we name panoptic segmentation (PS). Panoptic
segmentation unifies the typically distinct tasks of semantic segmentation
(assign a class label to each pixel) and instance segmentation (detect and
segment each object instance). The proposed task requires generating a coherent
scene segmentation that is rich and complete, an important step toward
real-world vision systems. While early work in computer vision addressed
related image/scene parsing tasks, these are not currently popular, possibly
due to lack of appropriate metrics or associated recognition challenges. To
address this, we propose a novel panoptic quality (PQ) metric that captures
performance for all classes (stuff and things) in an interpretable and unified
manner. Using the proposed metric, we perform a rigorous study of both human
and machine performance for PS on three existing datasets, revealing
interesting insights about the task. The aim of our work is to revive the
interest of the community in a more unified view of image segmentation.Comment: accepted to CVPR 201
- …