259,441 research outputs found
CERN: Confidence-Energy Recurrent Network for Group Activity Recognition
This work is about recognizing human activities occurring in videos at
distinct semantic levels, including individual actions, interactions, and group
activities. The recognition is realized using a two-level hierarchy of Long
Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture,
which can be trained end-to-end. In comparison with existing architectures of
LSTMs, we make two key contributions giving the name to our approach as
Confidence-Energy Recurrent Network -- CERN. First, instead of using the common
softmax layer for prediction, we specify a novel energy layer (EL) for
estimating the energy of our predictions. Second, rather than finding the
common minimum-energy class assignment, which may be numerically unstable under
uncertainty, we specify that the EL additionally computes the p-values of the
solutions, and in this way estimates the most confident energy minimum. The
evaluation on the Collective Activity and Volleyball datasets demonstrates: (i)
advantages of our two contributions relative to the common softmax and
energy-minimization formulations and (ii) a superior performance relative to
the state-of-the-art approaches.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Latent Embeddings for Collective Activity Recognition
Rather than simply recognizing the action of a person individually,
collective activity recognition aims to find out what a group of people is
acting in a collective scene. Previ- ous state-of-the-art methods using
hand-crafted potentials in conventional graphical model which can only define a
limited range of relations. Thus, the complex structural de- pendencies among
individuals involved in a collective sce- nario cannot be fully modeled. In
this paper, we overcome these limitations by embedding latent variables into
feature space and learning the feature mapping functions in a deep learning
framework. The embeddings of latent variables build a global relation
containing person-group interac- tions and richer contextual information by
jointly modeling broader range of individuals. Besides, we assemble atten- tion
mechanism during embedding for achieving more com- pact representations. We
evaluate our method on three col- lective activity datasets, where we
contribute a much larger dataset in this work. The proposed model has achieved
clearly better performance as compared to the state-of-the- art methods in our
experiments.Comment: 6pages, accepted by IEEE-AVSS201
- …