2 research outputs found
Dynamic Context Correspondence Network for Semantic Alignment
Establishing semantic correspondence is a core problem in computer vision and
remains challenging due to large intra-class variations and lack of annotated
data. In this paper, we aim to incorporate global semantic context in a
flexible manner to overcome the limitations of prior work that relies on local
semantic representations. To this end, we first propose a context-aware
semantic representation that incorporates spatial layout for robust matching
against local ambiguities. We then develop a novel dynamic fusion strategy
based on attention mechanism to weave the advantages of both local and context
features by integrating semantic cues from multiple scales. We instantiate our
strategy by designing an end-to-end learnable deep network, named as Dynamic
Context Correspondence Network (DCCNet). To train the network, we adopt a
multi-auxiliary task loss to improve the efficiency of our weakly-supervised
learning procedure. Our approach achieves superior or competitive performance
over previous methods on several challenging datasets, including PF-Pascal,
PF-Willow, and TSS, demonstrating its effectiveness and generality.Comment: ICCV 201
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion
With the development of web technology, multi-modal or multi-view data has
surged as a major stream for big data, where each modal/view encodes individual
property of data objects. Often, different modalities are complementary to each
other. Such fact motivated a lot of research attention on fusing the
multi-modal feature spaces to comprehensively characterize the data objects.
Most of the existing state-of-the-art focused on how to fuse the energy or
information from multi-modal spaces to deliver a superior performance over
their counterparts with single modal. Recently, deep neural networks have
exhibited as a powerful architecture to well capture the nonlinear distribution
of high-dimensional multimedia data, so naturally does for multi-modal data.
Substantial empirical studies are carried out to demonstrate its advantages
that are benefited from deep multi-modal methods, which can essentially deepen
the fusion from multi-modal deep feature spaces. In this paper, we provide a
substantial overview of the existing state-of-the-arts on the filed of
multi-modal data analytics from shallow to deep spaces. Throughout this survey,
we further indicate that the critical components for this field go to
collaboration, adversarial competition and fusion over multi-modal spaces.
Finally, we share our viewpoints regarding some future directions on this
field.Comment: Appearing at ACM TOMM, 26 page