5,829 research outputs found
Context-Dependent Diffusion Network for Visual Relationship Detection
Visual relationship detection can bridge the gap between computer vision and
natural language for scene understanding of images. Different from pure object
recognition tasks, the relation triplets of subject-predicate-object lie on an
extreme diversity space, such as \textit{person-behind-person} and
\textit{car-behind-building}, while suffering from the problem of combinatorial
explosion. In this paper, we propose a context-dependent diffusion network
(CDDN) framework to deal with visual relationship detection. To capture the
interactions of different object instances, two types of graphs, word semantic
graph and visual scene graph, are constructed to encode global context
interdependency. The semantic graph is built through language priors to model
semantic correlations across objects, whilst the visual scene graph defines the
connections of scene objects so as to utilize the surrounding scene
information. For the graph-structured data, we design a diffusion network to
adaptively aggregate information from contexts, which can effectively learn
latent representations of visual relationships and well cater to visual
relationship detection in view of its isomorphic invariance to graphs.
Experiments on two widely-used datasets demonstrate that our proposed method is
more effective and achieves the state-of-the-art performance.Comment: 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18
Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences
This paper introduces sparse coding and dictionary learning for Symmetric
Positive Definite (SPD) matrices, which are often used in machine learning,
computer vision and related areas. Unlike traditional sparse coding schemes
that work in vector spaces, in this paper we discuss how SPD matrices can be
described by sparse combination of dictionary atoms, where the atoms are also
SPD matrices. We propose to seek sparse coding by embedding the space of SPD
matrices into Hilbert spaces through two types of Bregman matrix divergences.
This not only leads to an efficient way of performing sparse coding, but also
an online and iterative scheme for dictionary learning. We apply the proposed
methods to several computer vision tasks where images are represented by region
covariance matrices. Our proposed algorithms outperform state-of-the-art
methods on a wide range of classification tasks, including face recognition,
action recognition, material classification and texture categorization
- …