Design of Attention Mechanisms for Robust and Efficient Vehicle Re-Identification from Images and Videos

Abstract

In this work we explore the problem of Vehicle Re-identification using images and videos with applications in smart transportation systems. One of the sectors that can greatly benefit from the value of captured data from sensors on the road is transportation. Data-driven algorithms enable transportation systems to realise intelligent applications to improve operations, safety and experience of road users. Vehicle Re-identification refers to the task of retrieving images of a particular vehicle identity in a large gallery set, composed of images taken from different times, locations, in diverse orientations and within a network of traffic cameras. This task is extremely challenging as not only vehicles with different identities can be of the same make, model and color, but also a given vehicle can appear differently depending on the view-point, occlusion and lighting conditions, making it challenging to either distinguish or associate vehicle instances. To tackle these problems, in this dissertation, we develop a series of attention mechanisms to account for local discriminative regions and generate more robust visual representations of vehicles. In our first work, we propose the Adaptive Attention Vehicle Re-identification (AAVER) model that is equipped with an attention mechanism learned in a supervised manner to locate local regions in the form of key-points of vehicles and extract discriminative features along two parallel paths. The model combines the embeddings of two paths and outputs a single visual representation of the input image. While AAVER highlights how attention can benefit the discriminative capability of a re-identification system by identifying identity-dependant cues such as key-points or vehicle parts, we note that this requires access to abundant additional annotations that are expensive to collect and more often than not are accompanied by noise. In an effort to re-design the vehicle re-identification pipeline without the need for such expensive annotations, we propose Self-supervised Vehicle Re-identification (SAVER) model to automatically highlight salient regions in a vehicle image and mine discriminative representations. SAVER generates robust embeddings; however, it requires a forward pass through a computationally expensive network to generate points of attention at inference stage which imposes a bottleneck and limits its potential adoption in real-time and large-scale applications. Therefore, in our next work, we formulated a training strategy inspired by the notion of curriculum learning and designed the Excited Vehicle Re-identification (EVER) model that benefits from a semi-supervised attention mechanism and only relies on the attention generated by SAVER in the course of training. Recent advancements in the area of self-supervised representation learning have been able to close the performance gap between self-supervised and fully-supervised methods in a spectacular manner. This motivated us to explore these findings in the context of vehicle re-identification and come up with a design that can preserve the lightweightness of EVER while matching or beating the performance of SAVER. Based on this, in our followup work, we proposed the Self-supervised Boosted Vehicle Re-identification model (SSBVER) that is trained in a hybrid manner and learns an implicit attention mechanism Finally, we propose a real-time and city-scale multi-camera vehicle tracking system that detects, tracks and re-identifies vehicles across traffic cameras on a large scale. The proposed system, has been integrated into the Regional Integrated Transportation Information System (RITIS) platform which is a data-driven platform from the University of Maryland for transportation analysis, monitoring, and data visualization

    Similar works