8 research outputs found
Task Specific Visual Saliency Prediction with Memory Augmented Conditional Generative Adversarial Networks
Visual saliency patterns are the result of a variety of factors aside from
the image being parsed, however existing approaches have ignored these. To
address this limitation, we propose a novel saliency estimation model which
leverages the semantic modelling power of conditional generative adversarial
networks together with memory architectures which capture the subject's
behavioural patterns and task dependent factors. We make contributions aiming
to bridge the gap between bottom-up feature learning capabilities in modern
deep learning architectures and traditional top-down hand-crafted features
based methods for task specific saliency modelling. The conditional nature of
the proposed framework enables us to learn contextual semantics and
relationships among different tasks together, instead of learning them
separately for each task. Our studies not only shed light on a novel
application area for generative adversarial networks, but also emphasise the
importance of task specific saliency modelling and demonstrate the plausibility
of fully capturing this context via an augmented memory architecture.Comment: To appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 201
Pedestrian Trajectory Prediction with Structured Memory Hierarchies
This paper presents a novel framework for human trajectory prediction based
on multimodal data (video and radar). Motivated by recent neuroscience
discoveries, we propose incorporating a structured memory component in the
human trajectory prediction pipeline to capture historical information to
improve performance. We introduce structured LSTM cells for modelling the
memory content hierarchically, preserving the spatiotemporal structure of the
information and enabling us to capture both short-term and long-term context.
We demonstrate how this architecture can be extended to integrate salient
information from multiple modalities to automatically store and retrieve
important information for decision making without any supervision. We evaluate
the effectiveness of the proposed models on a novel multimodal dataset that we
introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from
a radar system and a CCTV camera system installed in a public place. The
performance is also evaluated on the publicly available New York Grand Central
pedestrian database. In both settings, the proposed models demonstrate their
capability to better anticipate future pedestrian motion compared to existing
state of the art.Comment: To appear in ECML-PKDD 201
Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-Agent Trajectories
With the explosion in the availability of spatio-temporal tracking data in
modern sports, there is an enormous opportunity to better analyse, learn and
predict important events in adversarial group environments. In this paper, we
propose a deep decision tree architecture for discriminative dictionary
learning from adversarial multi-agent trajectories. We first build up a
hierarchy for the tree structure by adding each layer and performing feature
weight based clustering in the forward pass. We then fine tune the player role
weights using back propagation. The hierarchical architecture ensures the
interpretability and the integrity of the group representation. The resulting
architecture is a decision tree, with leaf-nodes capturing a dictionary of
multi-agent group interactions. Due to the ample volume of data available, we
focus on soccer tracking data, although our approach can be used in any
adversarial multi-agent domain. We present applications of proposed method for
simulating soccer games as well as evaluating and quantifying team strategies.Comment: To appear in 4th International Workshop on Computer Vision in Sports
(CVsports) at CVPR 201
Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking
Current multi-person localisation and tracking systems have an over reliance
on the use of appearance models for target re-identification and almost no
approaches employ a complete deep learning solution for both objectives. We
present a novel, complete deep learning framework for multi-person localisation
and tracking. In this context we first introduce a light weight sequential
Generative Adversarial Network architecture for person localisation, which
overcomes issues related to occlusions and noisy detections, typically found in
a multi person environment. In the proposed tracking framework we build upon
recent advances in pedestrian trajectory prediction approaches and propose a
novel data association scheme based on predicted trajectories. This removes the
need for computationally expensive person re-identification systems based on
appearance features and generates human like trajectories with minimal
fragmentation. The proposed method is evaluated on multiple public benchmarks
including both static and dynamic cameras and is capable of generating
outstanding performance, especially among other recently proposed deep neural
network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 201
Learning Temporal Strategic Relationships using Generative Adversarial Imitation Learning
This paper presents a novel framework for automatic learning of complex
strategies in human decision making. The task that we are interested in is to
better facilitate long term planning for complex, multi-step events. We observe
temporal relationships at the subtask level of expert demonstrations, and
determine the different strategies employed in order to successfully complete a
task. To capture the relationship between the subtasks and the overall goal, we
utilise two external memory modules, one for capturing dependencies within a
single expert demonstration, such as the sequential relationship among
different sub tasks, and a global memory module for modelling task level
characteristics such as best practice employed by different humans based on
their domain expertise. Furthermore, we demonstrate how the hidden state
representation of the memory can be used as a reward signal to smooth the state
transitions, eradicating subtle changes. We evaluate the effectiveness of the
proposed model for an autonomous highway driving application, where we
demonstrate its capability to learn different expert policies and outperform
state-of-the-art methods. The scope in industrial applications extends to any
robotics and automation application which requires learning from complex
demonstrations containing series of subtasks.Comment: International Foundation for Autonomous Agents and Multiagent
Systems, 201
Task specific visual saliency prediction with memory augmented conditional generative adversarial networks
Visual saliency patterns are the result of a variety of factors aside from the image being parsed, however existing approaches have ignored these. To address this limitation, we propose a novel saliency estimation model which leverages the semantic modelling power of conditional generative adversarial networks together with memory architectures which capture the subject's behavioural patterns and task dependent factors. We make contributions aiming to bridge the gap between bottom-up feature learning capabilities in modern deep learning architectures and traditional top-down hand-crafted features based methods for task specific saliency modelling. The conditional nature of the proposed framework enables us to learn contextual semantics and relationships among different tasks together, instead of learning them separately for each task. Our studies not only shed light on a novel application area for generative adversarial networks, but also emphasise the importance of task specific saliency modelling and demonstrate the plausibility of fully capturing this context via an augmented memory architecture