Search CORE

3,401 research outputs found

CAR-Net: Clairvoyant Attentive Recurrent Network

Author: A Robicquet
B Zhao
BT Morris
CK Williams
D Helbing
D Makris
D Xie
HS Koppula
J Quiñonero-Candela
JM Wang
KM Kitani
L Ballan
N Graham
O Russakovsky
P McCullagh
R Vesel
RE Kalman
S Hochreiter
S Pellegrini
Publication venue
Publication date: 16/07/2018
Field of study

We present an interpretable framework for path prediction that leverages dependencies between agents' behaviors and their spatial navigation environment. We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-view image of the navigation scene. We propose a Clairvoyant Attentive Recurrent Network (CAR-Net) that learns where to look in a large image of the scene when solving the path prediction task. Our method can attend to any area, or combination of areas, within the raw image (e.g., road intersections) when predicting the trajectory of the agent. This allows us to visualize fine-grained semantic elements of navigation scenes that influence the prediction of trajectories. To study the impact of space on agents' trajectories, we build a new dataset made of top-view images of hundreds of scenes (Formula One racing tracks) where agents' behaviors are heavily influenced by known areas in the images (e.g., upcoming turns). CAR-Net successfully attends to these salient regions. Additionally, CAR-Net reaches state-of-the-art accuracy on the standard trajectory forecasting benchmark, Stanford Drone Dataset (SDD). Finally, we show CAR-Net's ability to generalize to unseen scenes.Comment: The 2nd and 3rd authors contributed equall

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Crowd Saliency Detection via Global Similarity Structure

Author: Chan Chee Seng
Kok Ven Jyn
Lim Mei Kuan
Loy Chen Change
Publication venue
Publication date: 14/10/2014
Field of study

It is common for CCTV operators to overlook inter- esting events taking place within the crowd due to large number of people in the crowded scene (i.e. marathon, rally). Thus, there is a dire need to automate the detection of salient crowd regions acquiring immediate attention for a more effective and proactive surveillance. This paper proposes a novel framework to identify and localize salient regions in a crowd scene, by transforming low-level features extracted from crowd motion field into a global similarity structure. The global similarity structure representation allows the discovery of the intrinsic manifold of the motion dynamics, which could not be captured by the low-level representation. Ranking is then performed on the global similarity structure to identify a set of extrema. The proposed approach is unsupervised so learning stage is eliminated. Experimental results on public datasets demonstrates the effectiveness of exploiting such extrema in identifying salient regions in various crowd scenarios that exhibit crowding, local irregular motion, and unique motion areas such as sources and sinks.Comment: Accepted in ICPR 2014 (Oral). Mei Kuan Lim and Ven Jyn Kok share equal contribution

arXiv.org e-Print Archive

CiteSeerX

Future Person Localization in First-Person Videos

Author: Mangalam Karttikeya
Sato Yoichi
Yagi Takuma
Yonetani Ryo
Publication venue
Publication date: 27/03/2018
Field of study

We present a new task that predicts future locations of people observed in first-person videos. Consider a first-person video stream continuously recorded by a wearable camera. Given a short clip of a person that is extracted from the complete stream, we aim to predict that person's location in future frames. To facilitate this future person localization ability, we make the following three key observations: a) First-person videos typically involve significant ego-motion which greatly affects the location of the target person in future frames; b) Scales of the target person act as a salient cue to estimate a perspective effect in first-person videos; c) First-person videos often capture people up-close, making it easier to leverage target poses (e.g., where they look) for predicting their future locations. We incorporate these three observations into a prediction framework with a multi-stream convolution-deconvolution architecture. Experimental results reveal our method to be effective on our new dataset as well as on a public social interaction dataset.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Contextual anomaly detection in crowded surveillance scenes

Author: Leach Michael
Robertson Neil
Sparks Ed
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

AbstractThis work addresses the problem of detecting human behavioural anomalies in crowded surveillance environments. We focus in particular on the problem of detecting subtle anomalies in a behaviourally heterogeneous surveillance scene. To reach this goal we implement a novel unsupervised context-aware process. We propose and evaluate a method of utilising social context and scene context to improve behaviour analysis. We find that in a crowded scene the application of Mutual Information based social context permits the ability to prevent self-justifying groups and propagate anomalies in a social network, granting a greater anomaly detection capability. Scene context uniformly improves the detection of anomalies in both datasets. The strength of our contextual features is demonstrated by the detection of subtly abnormal behaviours, which otherwise remain indistinguishable from normal behaviour

Queen's University Belfast Research Portal

Heriot Watt Pure

Elsevier - Publisher Connector