16 research outputs found
Tree Memory Networks for Modelling Long-term Temporal Dependencies
In the domain of sequence modelling, Recurrent Neural Networks (RNN) have
been capable of achieving impressive results in a variety of application areas
including visual question answering, part-of-speech tagging and machine
translation. However this success in modelling short term dependencies has not
successfully transitioned to application areas such as trajectory prediction,
which require capturing both short term and long term relationships. In this
paper, we propose a Tree Memory Network (TMN) for modelling long term and short
term relationships in sequence-to-sequence mapping problems. The proposed
network architecture is composed of an input module, controller and a memory
module. In contrast to related literature, which models the memory as a
sequence of historical states, we model the memory as a recursive tree
structure. This structure more effectively captures temporal dependencies
across both short term and long term sequences using its hierarchical
structure. We demonstrate the effectiveness and flexibility of the proposed TMN
in two practical problems, aircraft trajectory modelling and pedestrian
trajectory modelling in a surveillance setting, and in both cases we outperform
the current state-of-the-art. Furthermore, we perform an in depth analysis on
the evolution of the memory module content over time and provide visual
evidence on how the proposed TMN is able to map both long term and short term
relationships efficiently via a hierarchical structure
Pedestrian Trajectory Prediction with Structured Memory Hierarchies
This paper presents a novel framework for human trajectory prediction based
on multimodal data (video and radar). Motivated by recent neuroscience
discoveries, we propose incorporating a structured memory component in the
human trajectory prediction pipeline to capture historical information to
improve performance. We introduce structured LSTM cells for modelling the
memory content hierarchically, preserving the spatiotemporal structure of the
information and enabling us to capture both short-term and long-term context.
We demonstrate how this architecture can be extended to integrate salient
information from multiple modalities to automatically store and retrieve
important information for decision making without any supervision. We evaluate
the effectiveness of the proposed models on a novel multimodal dataset that we
introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from
a radar system and a CCTV camera system installed in a public place. The
performance is also evaluated on the publicly available New York Grand Central
pedestrian database. In both settings, the proposed models demonstrate their
capability to better anticipate future pedestrian motion compared to existing
state of the art.Comment: To appear in ECML-PKDD 201
Survey on Vision-based Path Prediction
Path prediction is a fundamental task for estimating how pedestrians or
vehicles are going to move in a scene. Because path prediction as a task of
computer vision uses video as input, various information used for prediction,
such as the environment surrounding the target and the internal state of the
target, need to be estimated from the video in addition to predicting paths.
Many prediction approaches that include understanding the environment and the
internal state have been proposed. In this survey, we systematically summarize
methods of path prediction that take video as input and and extract features
from the video. Moreover, we introduce datasets used to evaluate path
prediction methods quantitatively.Comment: DAPI 201
Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-Agent Trajectories
With the explosion in the availability of spatio-temporal tracking data in
modern sports, there is an enormous opportunity to better analyse, learn and
predict important events in adversarial group environments. In this paper, we
propose a deep decision tree architecture for discriminative dictionary
learning from adversarial multi-agent trajectories. We first build up a
hierarchy for the tree structure by adding each layer and performing feature
weight based clustering in the forward pass. We then fine tune the player role
weights using back propagation. The hierarchical architecture ensures the
interpretability and the integrity of the group representation. The resulting
architecture is a decision tree, with leaf-nodes capturing a dictionary of
multi-agent group interactions. Due to the ample volume of data available, we
focus on soccer tracking data, although our approach can be used in any
adversarial multi-agent domain. We present applications of proposed method for
simulating soccer games as well as evaluating and quantifying team strategies.Comment: To appear in 4th International Workshop on Computer Vision in Sports
(CVsports) at CVPR 201
Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking
Current multi-person localisation and tracking systems have an over reliance
on the use of appearance models for target re-identification and almost no
approaches employ a complete deep learning solution for both objectives. We
present a novel, complete deep learning framework for multi-person localisation
and tracking. In this context we first introduce a light weight sequential
Generative Adversarial Network architecture for person localisation, which
overcomes issues related to occlusions and noisy detections, typically found in
a multi person environment. In the proposed tracking framework we build upon
recent advances in pedestrian trajectory prediction approaches and propose a
novel data association scheme based on predicted trajectories. This removes the
need for computationally expensive person re-identification systems based on
appearance features and generates human like trajectories with minimal
fragmentation. The proposed method is evaluated on multiple public benchmarks
including both static and dynamic cameras and is capable of generating
outstanding performance, especially among other recently proposed deep neural
network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 201
An Improved Time Feedforward Connections Recurrent Neural Networks
Recurrent Neural Networks (RNNs) have been widely applied to deal with
temporal problems, such as flood forecasting and financial data processing. On
the one hand, traditional RNNs models amplify the gradient issue due to the
strict time serial dependency, making it difficult to realize a long-term
memory function. On the other hand, RNNs cells are highly complex, which will
significantly increase computational complexity and cause waste of
computational resources during model training. In this paper, an improved Time
Feedforward Connections Recurrent Neural Networks (TFC-RNNs) model was first
proposed to address the gradient issue. A parallel branch was introduced for
the hidden state at time t-2 to be directly transferred to time t without the
nonlinear transformation at time t-1. This is effective in improving the
long-term dependence of RNNs. Then, a novel cell structure named Single Gate
Recurrent Unit (SGRU) was presented. This cell structure can reduce the number
of parameters for RNNs cell, consequently reducing the computational
complexity. Next, applying SGRU to TFC-RNNs as a new TFC-SGRU model solves the
above two difficulties. Finally, the performance of our proposed TFC-SGRU was
verified through several experiments in terms of long-term memory and
anti-interference capabilities. Experimental results demonstrated that our
proposed TFC-SGRU model can capture helpful information with time step 1500 and
effectively filter out the noise. The TFC-SGRU model accuracy is better than
the LSTM and GRU models regarding language processing ability
Task Specific Visual Saliency Prediction with Memory Augmented Conditional Generative Adversarial Networks
Visual saliency patterns are the result of a variety of factors aside from
the image being parsed, however existing approaches have ignored these. To
address this limitation, we propose a novel saliency estimation model which
leverages the semantic modelling power of conditional generative adversarial
networks together with memory architectures which capture the subject's
behavioural patterns and task dependent factors. We make contributions aiming
to bridge the gap between bottom-up feature learning capabilities in modern
deep learning architectures and traditional top-down hand-crafted features
based methods for task specific saliency modelling. The conditional nature of
the proposed framework enables us to learn contextual semantics and
relationships among different tasks together, instead of learning them
separately for each task. Our studies not only shed light on a novel
application area for generative adversarial networks, but also emphasise the
importance of task specific saliency modelling and demonstrate the plausibility
of fully capturing this context via an augmented memory architecture.Comment: To appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 201