26,528 research outputs found
Predicting Human Interaction via Relative Attention Model
Predicting human interaction is challenging as the on-going activity has to
be inferred based on a partially observed video. Essentially, a good algorithm
should effectively model the mutual influence between the two interacting
subjects. Also, only a small region in the scene is discriminative for
identifying the on-going interaction. In this work, we propose a relative
attention model to explicitly address these difficulties. Built on a
tri-coupled deep recurrent structure representing both interacting subjects and
global interaction status, the proposed network collects spatio-temporal
information from each subject, rectified with global interaction information,
yielding effective interaction representation. Moreover, the proposed network
also unifies an attention module to assign higher importance to the regions
which are relevant to the on-going action. Extensive experiments have been
conducted on two public datasets, and the results demonstrate that the proposed
relative attention network successfully predicts informative regions between
interacting subjects, which in turn yields superior human interaction
prediction accuracy.Comment: To appear in IJCAI 201
Learning to Control in Metric Space with Optimal Regret
We study online reinforcement learning for finite-horizon deterministic
control systems with {\it arbitrary} state and action spaces. Suppose that the
transition dynamics and reward function is unknown, but the state and action
space is endowed with a metric that characterizes the proximity between
different states and actions. We provide a surprisingly simple upper-confidence
reinforcement learning algorithm that uses a function approximation oracle to
estimate optimistic Q functions from experiences. We show that the regret of
the algorithm after episodes is where is a
smoothness parameter, and is the doubling dimension of the state-action
space with respect to the given metric. We also establish a near-matching
regret lower bound. The proposed method can be adapted to work for more
structured transition systems, including the finite-state case and the case
where value functions are linear combinations of features, where the method
also achieve the optimal regret
Skeleton-aided Articulated Motion Generation
This work make the first attempt to generate articulated human motion
sequence from a single image. On the one hand, we utilize paired inputs
including human skeleton information as motion embedding and a single human
image as appearance reference, to generate novel motion frames, based on the
conditional GAN infrastructure. On the other hand, a triplet loss is employed
to pursue appearance-smoothness between consecutive frames. As the proposed
framework is capable of jointly exploiting the image appearance space and
articulated/kinematic motion space, it generates realistic articulated motion
sequence, in contrast to most previous video generation methods which yield
blurred motion effects. We test our model on two human action datasets
including KTH and Human3.6M, and the proposed framework generates very
promising results on both datasets.Comment: ACM MM 201
Triplet-based Deep Similarity Learning for Person Re-Identification
In recent years, person re-identification (re-id) catches great attention in
both computer vision community and industry. In this paper, we propose a new
framework for person re-identification with a triplet-based deep similarity
learning using convolutional neural networks (CNNs). The network is trained
with triplet input: two of them have the same class labels and the other one is
different. It aims to learn the deep feature representation, with which the
distance within the same class is decreased, while the distance between the
different classes is increased as much as possible. Moreover, we trained the
model jointly on six different datasets, which differs from common practice -
one model is just trained on one dataset and tested also on the same one.
However, the enormous number of possible triplet data among the large number of
training samples makes the training impossible. To address this challenge, a
double-sampling scheme is proposed to generate triplets of images as effective
as possible. The proposed framework is evaluated on several benchmark datasets.
The experimental results show that, our method is effective for the task of
person re-identification and it is comparable or even outperforms the
state-of-the-art methods.Comment: ICCV Workshops 201
- …