308 research outputs found
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Most recent semantic segmentation methods adopt a fully-convolutional network
(FCN) with an encoder-decoder architecture. The encoder progressively reduces
the spatial resolution and learns more abstract/semantic visual concepts with
larger receptive fields. Since context modeling is critical for segmentation,
the latest efforts have been focused on increasing the receptive field, through
either dilated/atrous convolutions or inserting attention modules. However, the
encoder-decoder based FCN architecture remains unchanged. In this paper, we aim
to provide an alternative perspective by treating semantic segmentation as a
sequence-to-sequence prediction task. Specifically, we deploy a pure
transformer (ie, without convolution and resolution reduction) to encode an
image as a sequence of patches. With the global context modeled in every layer
of the transformer, this encoder can be combined with a simple decoder to
provide a powerful segmentation model, termed SEgmentation TRansformer (SETR).
Extensive experiments show that SETR achieves new state of the art on ADE20K
(50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on
Cityscapes. Particularly, we achieve the first position in the highly
competitive ADE20K test server leaderboard on the day of submission.Comment: CVPR 2021. Project page at https://fudan-zvg.github.io/SETR
Parsing Objects at a Finer Granularity: A Survey
Fine-grained visual parsing, including fine-grained part segmentation and
fine-grained object recognition, has attracted considerable critical attention
due to its importance in many real-world applications, e.g., agriculture,
remote sensing, and space technologies. Predominant research efforts tackle
these fine-grained sub-tasks following different paradigms, while the inherent
relations between these tasks are neglected. Moreover, given most of the
research remains fragmented, we conduct an in-depth study of the advanced work
from a new perspective of learning the part relationship. In this perspective,
we first consolidate recent research and benchmark syntheses with new
taxonomies. Based on this consolidation, we revisit the universal challenges in
fine-grained part segmentation and recognition tasks and propose new solutions
by part relationship learning for these important challenges. Furthermore, we
conclude several promising lines of research in fine-grained visual parsing for
future research.Comment: Survey for fine-grained part segmentation and object recognition;
Accepted by Machine Intelligence Research (MIR
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
A Survey on Knowledge Graphs: Representation, Acquisition and Applications
Human knowledge provides a formal understanding of the world. Knowledge
graphs that represent structural relations between entities have become an
increasingly popular research direction towards cognition and human-level
intelligence. In this survey, we provide a comprehensive review of knowledge
graph covering overall research topics about 1) knowledge graph representation
learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph,
and 4) knowledge-aware applications, and summarize recent breakthroughs and
perspective directions to facilitate future research. We propose a full-view
categorization and new taxonomies on these topics. Knowledge graph embedding is
organized from four aspects of representation space, scoring function, encoding
models, and auxiliary information. For knowledge acquisition, especially
knowledge graph completion, embedding methods, path inference, and logical rule
reasoning, are reviewed. We further explore several emerging topics, including
meta relational learning, commonsense reasoning, and temporal knowledge graphs.
To facilitate future research on knowledge graphs, we also provide a curated
collection of datasets and open-source libraries on different tasks. In the
end, we have a thorough outlook on several promising research directions
Structured Landmark Detection via Topology-Adapting Deep Graph Learning
Image landmark detection aims to automatically identify the locations of
predefined fiducial points. Despite recent success in this field,
higher-ordered structural modeling to capture implicit or explicit
relationships among anatomical landmarks has not been adequately exploited. In
this work, we present a new topology-adapting deep graph learning approach for
accurate anatomical facial and medical (e.g., hand, pelvis) landmark detection.
The proposed method constructs graph signals leveraging both local image
features and global shape features. The adaptive graph topology naturally
explores and lands on task-specific structures which are learned end-to-end
with two Graph Convolutional Networks (GCNs). Extensive experiments are
conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as
well as three real-world X-ray medical datasets (Cephalometric (public), Hand
and Pelvis). Quantitative results comparing with the previous state-of-the-art
approaches across all studied datasets indicating the superior performance in
both robustness and accuracy. Qualitative visualizations of the learned graph
topologies demonstrate a physically plausible connectivity laying behind the
landmarks.Comment: Accepted to ECCV-20. Camera-ready with supplementary materia
- …