Appearance modeling for persistent object tracking in wide-area and full motion video

Abstract

Object tracking is a core element of computer vision and autonomous systems. As such single and multiple object tracking has been widely investigated especially for full motion video sequences. The acquisition of wide-area motion imagery (WAMI) from moving airborne platforms is a much more recent sensor innovation that has an array of defense and civilian applications with numerous opportunities for providing a unique combination of dense spatial and temporal coverage unmatched by other sensor systems. Airborne WAMI presents a host of challenges for object tracking including large data volume, multi-camera arrays, image stabilization, low resolution targets, target appearance variability and high background clutter especially in urban environments. Time varying low frame rate large imagery poses a range of difficulties in terms of reliable long term multi-target tracking. The focus of this thesis is on the Likelihood of Features Tracking (LOFT) testbed system that is an appearance based (single instance) object tracker designed specifcally for WAMI and follows the track before detect paradigm. The motivation for tracking using dynamics before detecting was so that large scale data can be handled in an environment where computational cost can be kept at a bare minimum. Searching for an object everywhere on a large frame is not practical as there are many similar objects, clutter, high rise structures in case of urban scenes and comes with the additional burden of greatly increased computational cost. LOFT bypasses this difficulty by using filtering and dynamics to constrain the search area to a more realistic region within the large frame and uses multiple features to discern objects of interest. The objects of interest are expected as input in the form of bounding boxes to the algorithm. The main goal of this work is to present an appearance update modeling strategy that fits LOFT's track before detect paradigm and to showcase the accuracy of the overall system as compared with other state of the art tracking algorithms and also with and without the presence of this strategy. The update strategy using various information cues from the Radon Transform was designed with certain performance parameters in mind such as minimal increase in computational cost and a considerable increase in precision and recall rates of the overall system. This has been demonstrated with supporting performance numbers using standard evaluation techniques as in literature. The extensions of LOFT WAMI tracker to include a more detailed appearance model with an update strategy that is well suited for persistent target tracking is novel in the opinion of the author. Key engineering contributions have been made with the help of this work wherein the core LOFT has been evaluated as part several government research and development programs including the Air Force Research Lab's Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance (C4ISR) Enterprise to the Edge (CETE), Army Research Lab's Advanced Video Activity Analytics (AVAA) and a proposed fine grained distributed computing architecture on the cloud for processing at the edge. A simplified version of LOFT was developed for tracking objects in standard videos and entered in the Visual Object Tracking (VOT) Challenge competition that is held in conjunction with the leading computer vision conferences. LOFT incorporating the proposed appearance adaptation module produces significantly better tracking results in aerial WAMI of urban scenes

    Similar works