18,522 research outputs found
PersonRank: Detecting Important People in Images
Always, some individuals in images are more important/attractive than others
in some events such as presentation, basketball game or speech. However, it is
challenging to find important people among all individuals in images directly
based on their spatial or appearance information due to the existence of
diverse variations of pose, action, appearance of persons and various changes
of occasions. We overcome this difficulty by constructing a multiple
Hyper-Interaction Graph to treat each individual in an image as a node and
inferring the most active node referring to interactions estimated by various
types of clews. We model pairwise interactions between persons as the edge
message communicated between nodes, resulting in a bidirectional
pairwise-interaction graph. To enrich the personperson interaction estimation,
we further introduce a unidirectional hyper-interaction graph that models the
consensus of interaction between a focal person and any person in a local
region around. Finally, we modify the PageRank algorithm to infer the
activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union
of the pairwise-interaction and hyperinteraction graphs, and we call our
algorithm the PersonRank. In order to provide publicable datasets for
evaluation, we have contributed a new dataset called Multi-scene Important
People Image Dataset and gathered a NCAA Basketball Image Dataset from sports
game sequences. We have demonstrated that the proposed PersonRank outperforms
related methods clearly and substantially.Comment: 8 pages, conferenc
Learning Combinatorial Embedding Networks for Deep Graph Matching
Graph matching refers to finding node correspondence between graphs, such
that the corresponding node and edge's affinity can be maximized. In addition
with its NP-completeness nature, another important challenge is effective
modeling of the node-wise and structure-wise affinity across graphs and the
resulting objective, to guide the matching procedure effectively finding the
true matching against noises. To this end, this paper devises an end-to-end
differentiable deep network pipeline to learn the affinity for graph matching.
It involves a supervised permutation loss regarding with node correspondence to
capture the combinatorial nature for graph matching. Meanwhile deep graph
embedding models are adopted to parameterize both intra-graph and cross-graph
affinity functions, instead of the traditional shallow and simple parametric
forms e.g. a Gaussian kernel. The embedding can also effectively capture the
higher-order structure beyond second-order edges. The permutation loss model is
agnostic to the number of nodes, and the embedding model is shared among nodes
such that the network allows for varying numbers of nodes in graphs for
training and inference. Moreover, our network is class-agnostic with some
generalization capability across different categories. All these features are
welcomed for real-world applications. Experiments show its superiority against
state-of-the-art graph matching learning methods.Comment: ICCV2019 oral. Code available at
https://github.com/Thinklab-SJTU/PCA-G
Learning Human Motion Models for Long-term Predictions
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods
- …