37,767 research outputs found
Generating Long-term Trajectories Using Deep Hierarchical Networks
We study the problem of modeling spatiotemporal trajectories over long time
horizons using expert demonstrations. For instance, in sports, agents often
choose action sequences with long-term goals in mind, such as achieving a
certain strategic position. Conventional policy learning approaches, such as
those based on Markov decision processes, generally fail at learning cohesive
long-term behavior in such high-dimensional state spaces, and are only
effective when myopic modeling lead to the desired behavior. The key difficulty
is that conventional approaches are "shallow" models that only learn a single
state-action policy. We instead propose a hierarchical policy class that
automatically reasons about both long-term and short-term goals, which we
instantiate as a hierarchical neural network. We showcase our approach in a
case study on learning to imitate demonstrated basketball trajectories, and
show that it generates significantly more realistic trajectories compared to
non-hierarchical baselines as judged by professional sports analysts.Comment: Published in NIPS 201
Applying Deep Bidirectional LSTM and Mixture Density Network for Basketball Trajectory Prediction
Data analytics helps basketball teams to create tactics. However, manual data
collection and analytics are costly and ineffective. Therefore, we applied a
deep bidirectional long short-term memory (BLSTM) and mixture density network
(MDN) approach. This model is not only capable of predicting a basketball
trajectory based on real data, but it also can generate new trajectory samples.
It is an excellent application to help coaches and players decide when and
where to shoot. Its structure is particularly suitable for dealing with time
series problems. BLSTM receives forward and backward information at the same
time, while stacking multiple BLSTMs further increases the learning ability of
the model. Combined with BLSTMs, MDN is used to generate a multi-modal
distribution of outputs. Thus, the proposed model can, in principle, represent
arbitrary conditional probability distributions of output variables. We tested
our model with two experiments on three-pointer datasets from NBA SportVu data.
In the hit-or-miss classification experiment, the proposed model outperformed
other models in terms of the convergence speed and accuracy. In the trajectory
generation experiment, eight model-generated trajectories at a given time
closely matched real trajectories
Pedestrian Trajectory Prediction with Structured Memory Hierarchies
This paper presents a novel framework for human trajectory prediction based
on multimodal data (video and radar). Motivated by recent neuroscience
discoveries, we propose incorporating a structured memory component in the
human trajectory prediction pipeline to capture historical information to
improve performance. We introduce structured LSTM cells for modelling the
memory content hierarchically, preserving the spatiotemporal structure of the
information and enabling us to capture both short-term and long-term context.
We demonstrate how this architecture can be extended to integrate salient
information from multiple modalities to automatically store and retrieve
important information for decision making without any supervision. We evaluate
the effectiveness of the proposed models on a novel multimodal dataset that we
introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from
a radar system and a CCTV camera system installed in a public place. The
performance is also evaluated on the publicly available New York Grand Central
pedestrian database. In both settings, the proposed models demonstrate their
capability to better anticipate future pedestrian motion compared to existing
state of the art.Comment: To appear in ECML-PKDD 201
Tree Memory Networks for Modelling Long-term Temporal Dependencies
In the domain of sequence modelling, Recurrent Neural Networks (RNN) have
been capable of achieving impressive results in a variety of application areas
including visual question answering, part-of-speech tagging and machine
translation. However this success in modelling short term dependencies has not
successfully transitioned to application areas such as trajectory prediction,
which require capturing both short term and long term relationships. In this
paper, we propose a Tree Memory Network (TMN) for modelling long term and short
term relationships in sequence-to-sequence mapping problems. The proposed
network architecture is composed of an input module, controller and a memory
module. In contrast to related literature, which models the memory as a
sequence of historical states, we model the memory as a recursive tree
structure. This structure more effectively captures temporal dependencies
across both short term and long term sequences using its hierarchical
structure. We demonstrate the effectiveness and flexibility of the proposed TMN
in two practical problems, aircraft trajectory modelling and pedestrian
trajectory modelling in a surveillance setting, and in both cases we outperform
the current state-of-the-art. Furthermore, we perform an in depth analysis on
the evolution of the memory module content over time and provide visual
evidence on how the proposed TMN is able to map both long term and short term
relationships efficiently via a hierarchical structure
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Video Time: Properties, Encoders and Evaluation
Time-aware encoding of frame sequences in a video is a fundamental problem in
video understanding. While many attempted to model time in videos, an explicit
study on quantifying video time is missing. To fill this lacuna, we aim to
evaluate video time explicitly. We describe three properties of video time,
namely a) temporal asymmetry, b)temporal continuity and c) temporal causality.
Based on each we formulate a task able to quantify the associated property.
This allows assessing the effectiveness of modern video encoders, like C3D and
LSTM, in their ability to model time. Our analysis provides insights about
existing encoders while also leading us to propose a new video time encoder,
which is better suited for the video time recognition tasks than C3D and LSTM.
We believe the proposed meta-analysis can provide a reasonable baseline to
assess video time encoders on equal grounds on a set of temporal-aware tasks.Comment: 14 pages, BMVC 201
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Sequential data often possesses a hierarchical structure with complex
dependencies between subsequences, such as found between the utterances in a
dialogue. In an effort to model this kind of generative process, we propose a
neural network-based generative architecture, with latent stochastic variables
that span a variable number of time steps. We apply the proposed model to the
task of dialogue response generation and compare it with recent neural network
architectures. We evaluate the model performance through automatic evaluation
metrics and by carrying out a human evaluation. The experiments demonstrate
that our model improves upon recently proposed models and that the latent
variables facilitate the generation of long outputs and maintain the context.Comment: 15 pages, 5 tables, 4 figure
- …