1,082 research outputs found
Learning Combinatorial Embedding Networks for Deep Graph Matching
Graph matching refers to finding node correspondence between graphs, such
that the corresponding node and edge's affinity can be maximized. In addition
with its NP-completeness nature, another important challenge is effective
modeling of the node-wise and structure-wise affinity across graphs and the
resulting objective, to guide the matching procedure effectively finding the
true matching against noises. To this end, this paper devises an end-to-end
differentiable deep network pipeline to learn the affinity for graph matching.
It involves a supervised permutation loss regarding with node correspondence to
capture the combinatorial nature for graph matching. Meanwhile deep graph
embedding models are adopted to parameterize both intra-graph and cross-graph
affinity functions, instead of the traditional shallow and simple parametric
forms e.g. a Gaussian kernel. The embedding can also effectively capture the
higher-order structure beyond second-order edges. The permutation loss model is
agnostic to the number of nodes, and the embedding model is shared among nodes
such that the network allows for varying numbers of nodes in graphs for
training and inference. Moreover, our network is class-agnostic with some
generalization capability across different categories. All these features are
welcomed for real-world applications. Experiments show its superiority against
state-of-the-art graph matching learning methods.Comment: ICCV2019 oral. Code available at
https://github.com/Thinklab-SJTU/PCA-G
Predicting Human Interaction via Relative Attention Model
Predicting human interaction is challenging as the on-going activity has to
be inferred based on a partially observed video. Essentially, a good algorithm
should effectively model the mutual influence between the two interacting
subjects. Also, only a small region in the scene is discriminative for
identifying the on-going interaction. In this work, we propose a relative
attention model to explicitly address these difficulties. Built on a
tri-coupled deep recurrent structure representing both interacting subjects and
global interaction status, the proposed network collects spatio-temporal
information from each subject, rectified with global interaction information,
yielding effective interaction representation. Moreover, the proposed network
also unifies an attention module to assign higher importance to the regions
which are relevant to the on-going action. Extensive experiments have been
conducted on two public datasets, and the results demonstrate that the proposed
relative attention network successfully predicts informative regions between
interacting subjects, which in turn yields superior human interaction
prediction accuracy.Comment: To appear in IJCAI 201
Skeleton-aided Articulated Motion Generation
This work make the first attempt to generate articulated human motion
sequence from a single image. On the one hand, we utilize paired inputs
including human skeleton information as motion embedding and a single human
image as appearance reference, to generate novel motion frames, based on the
conditional GAN infrastructure. On the other hand, a triplet loss is employed
to pursue appearance-smoothness between consecutive frames. As the proposed
framework is capable of jointly exploiting the image appearance space and
articulated/kinematic motion space, it generates realistic articulated motion
sequence, in contrast to most previous video generation methods which yield
blurred motion effects. We test our model on two human action datasets
including KTH and Human3.6M, and the proposed framework generates very
promising results on both datasets.Comment: ACM MM 201
- …