10,140 research outputs found
Tree Memory Networks for Modelling Long-term Temporal Dependencies
In the domain of sequence modelling, Recurrent Neural Networks (RNN) have
been capable of achieving impressive results in a variety of application areas
including visual question answering, part-of-speech tagging and machine
translation. However this success in modelling short term dependencies has not
successfully transitioned to application areas such as trajectory prediction,
which require capturing both short term and long term relationships. In this
paper, we propose a Tree Memory Network (TMN) for modelling long term and short
term relationships in sequence-to-sequence mapping problems. The proposed
network architecture is composed of an input module, controller and a memory
module. In contrast to related literature, which models the memory as a
sequence of historical states, we model the memory as a recursive tree
structure. This structure more effectively captures temporal dependencies
across both short term and long term sequences using its hierarchical
structure. We demonstrate the effectiveness and flexibility of the proposed TMN
in two practical problems, aircraft trajectory modelling and pedestrian
trajectory modelling in a surveillance setting, and in both cases we outperform
the current state-of-the-art. Furthermore, we perform an in depth analysis on
the evolution of the memory module content over time and provide visual
evidence on how the proposed TMN is able to map both long term and short term
relationships efficiently via a hierarchical structure
Pedestrian Trajectory Prediction with Structured Memory Hierarchies
This paper presents a novel framework for human trajectory prediction based
on multimodal data (video and radar). Motivated by recent neuroscience
discoveries, we propose incorporating a structured memory component in the
human trajectory prediction pipeline to capture historical information to
improve performance. We introduce structured LSTM cells for modelling the
memory content hierarchically, preserving the spatiotemporal structure of the
information and enabling us to capture both short-term and long-term context.
We demonstrate how this architecture can be extended to integrate salient
information from multiple modalities to automatically store and retrieve
important information for decision making without any supervision. We evaluate
the effectiveness of the proposed models on a novel multimodal dataset that we
introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from
a radar system and a CCTV camera system installed in a public place. The
performance is also evaluated on the publicly available New York Grand Central
pedestrian database. In both settings, the proposed models demonstrate their
capability to better anticipate future pedestrian motion compared to existing
state of the art.Comment: To appear in ECML-PKDD 201
Learning to Hash-tag Videos with Tag2Vec
User-given tags or labels are valuable resources for semantic understanding
of visual media such as images and videos. Recently, a new type of labeling
mechanism known as hash-tags have become increasingly popular on social media
sites. In this paper, we study the problem of generating relevant and useful
hash-tags for short video clips. Traditional data-driven approaches for tag
enrichment and recommendation use direct visual similarity for label transfer
and propagation. We attempt to learn a direct low-cost mapping from video to
hash-tags using a two step training process. We first employ a natural language
processing (NLP) technique, skip-gram models with neural network training to
learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a
corpus of 10 million hash-tags. We then train an embedding function to map
video features to the low-dimensional Tag2vec space. We learn this embedding
for 29 categories of short video clips with hash-tags. A query video without
any tag-information can then be directly mapped to the vector space of tags
using the learned embedding and relevant tags can be found by performing a
simple nearest-neighbor retrieval in the Tag2Vec space. We validate the
relevance of the tags suggested by our system qualitatively and quantitatively
with a user study
- …