2 research outputs found
Understanding Dynamic Scenes using Graph Convolution Networks
We present a novel Multi-Relational Graph Convolutional Network (MRGCN) based
framework to model on-road vehicle behaviors from a sequence of temporally
ordered frames as grabbed by a moving monocular camera. The input to MRGCN is a
multi-relational graph where the graph's nodes represent the active and passive
agents/objects in the scene, and the bidirectional edges that connect every
pair of nodes are encodings of their Spatio-temporal relations. We show that
this proposed explicit encoding and usage of an intermediate spatio-temporal
interaction graph to be well suited for our tasks over learning end-end
directly on a set of temporally ordered spatial relations. We also propose an
attention mechanism for MRGCNs that conditioned on the scene dynamically scores
the importance of information from different interaction types. The proposed
framework achieves significant performance gain over prior methods on
vehicle-behavior classification tasks on four datasets. We also show a seamless
transfer of learning to multiple datasets without resorting to fine-tuning.
Such behavior prediction methods find immediate relevance in a variety of
navigation tasks such as behavior planning, state estimation, and applications
relating to the detection of traffic violations over videos.Comment: To appear at IROS 202
Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks
Understanding on-road vehicle behaviour from a temporal sequence of sensor
data is gaining in popularity. In this paper, we propose a pipeline for
understanding vehicle behaviour from a monocular image sequence or video. A
monocular sequence along with scene semantics, optical flow and object labels
are used to get spatial information about the object (vehicle) of interest and
other objects (semantically contiguous set of locations) in the scene. This
spatial information is encoded by a Multi-Relational Graph Convolutional
Network (MR-GCN), and a temporal sequence of such encodings is fed to a
recurrent network to label vehicle behaviours. The proposed framework can
classify a variety of vehicle behaviours to high fidelity on datasets that are
diverse and include European, Chinese and Indian on-road scenes. The framework
also provides for seamless transfer of models across datasets without entailing
re-annotation, retraining and even fine-tuning. We show comparative performance
gain over baseline Spatio-temporal classifiers and detail a variety of
ablations to showcase the efficacy of the framework.Comment: To appear in IV (IEEE Intelligent Vehicles Symposium) 202