46 research outputs found

    Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

    Get PDF
    A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.Comment: PhD Dissertation (162 pages

    Accurate, fast, and robust 3D city-scale reconstruction using wide area motion imagery

    Get PDF
    Multi-view stereopsis (MVS) is a core problem in computer vision, which takes a set of scene views together with known camera poses, then produces a geometric representation of the underlying 3D model Using 3D reconstruction one can determine any object's 3D profile, as well as knowing the 3D coordinate of any point on the profile. The 3D reconstruction of objects is a generally scientific problem and core technology of a wide variety of fields, such as Computer Aided Geometric Design (CAGD), computer graphics, computer animation, computer vision, medical imaging, computational science, virtual reality, digital media, etc. However, though MVS problems have been studied for decades, many challenges still exist in current state-of-the-art algorithms, for example, many algorithms still lack accuracy and completeness when tested on city-scale large datasets, most MVS algorithms available require a large amount of execution time and/or specialized hardware and software, which results in high cost, and etc... This dissertation work tries to address all the challenges we mentioned, and proposed multiple solutions. More specifically, this dissertation work proposed multiple novel MVS algorithms to automatically and accurately reconstruct the underlying 3D scenes. By proposing a novel volumetric voxel-based method, one of our algorithms achieved near real-time runtime speed, which does not require any special hardware or software, and can be deployed onto power-constrained embedded systems. By developing a new camera clustering module and a novel weighted voting-based surface likelihood estimation module, our algorithm is generalized to process di erent datasets, and achieved the best performance in terms of accuracy and completeness when compared with existing algorithms. This dissertation work also performs the very first quantitative evaluation in terms of precision, recall, and F-score using real-world LiDAR groundtruth data. Last but not least, this dissertation work proposes an automatic workflow, which can stitch multiple point cloud models with limited overlapping areas into one larger 3D model for better geographical coverage. All the results presented in this dissertation work have been evaluated in our wide area motion imagery (WAMI) dataset, and improved the state-of-the-art performances by a large margin.The generated results from this dissertation work have been successfully used in many aspects, including: city digitization, improving detection and tracking performances, real time dynamic shadow detection, 3D change detection, visibility map generating, VR environment, and visualization combined with other information, such as building footprint and roads.Includes bibliographical references

    Appearance modeling for persistent object tracking in wide-area and full motion video

    Get PDF
    Object tracking is a core element of computer vision and autonomous systems. As such single and multiple object tracking has been widely investigated especially for full motion video sequences. The acquisition of wide-area motion imagery (WAMI) from moving airborne platforms is a much more recent sensor innovation that has an array of defense and civilian applications with numerous opportunities for providing a unique combination of dense spatial and temporal coverage unmatched by other sensor systems. Airborne WAMI presents a host of challenges for object tracking including large data volume, multi-camera arrays, image stabilization, low resolution targets, target appearance variability and high background clutter especially in urban environments. Time varying low frame rate large imagery poses a range of difficulties in terms of reliable long term multi-target tracking. The focus of this thesis is on the Likelihood of Features Tracking (LOFT) testbed system that is an appearance based (single instance) object tracker designed specifcally for WAMI and follows the track before detect paradigm. The motivation for tracking using dynamics before detecting was so that large scale data can be handled in an environment where computational cost can be kept at a bare minimum. Searching for an object everywhere on a large frame is not practical as there are many similar objects, clutter, high rise structures in case of urban scenes and comes with the additional burden of greatly increased computational cost. LOFT bypasses this difficulty by using filtering and dynamics to constrain the search area to a more realistic region within the large frame and uses multiple features to discern objects of interest. The objects of interest are expected as input in the form of bounding boxes to the algorithm. The main goal of this work is to present an appearance update modeling strategy that fits LOFT's track before detect paradigm and to showcase the accuracy of the overall system as compared with other state of the art tracking algorithms and also with and without the presence of this strategy. The update strategy using various information cues from the Radon Transform was designed with certain performance parameters in mind such as minimal increase in computational cost and a considerable increase in precision and recall rates of the overall system. This has been demonstrated with supporting performance numbers using standard evaluation techniques as in literature. The extensions of LOFT WAMI tracker to include a more detailed appearance model with an update strategy that is well suited for persistent target tracking is novel in the opinion of the author. Key engineering contributions have been made with the help of this work wherein the core LOFT has been evaluated as part several government research and development programs including the Air Force Research Lab's Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance (C4ISR) Enterprise to the Edge (CETE), Army Research Lab's Advanced Video Activity Analytics (AVAA) and a proposed fine grained distributed computing architecture on the cloud for processing at the edge. A simplified version of LOFT was developed for tracking objects in standard videos and entered in the Visual Object Tracking (VOT) Challenge competition that is held in conjunction with the leading computer vision conferences. LOFT incorporating the proposed appearance adaptation module produces significantly better tracking results in aerial WAMI of urban scenes

    Learning representations in the hyperspectral domain in aerial imagery

    Get PDF
    We establish two new datasets with baselines and network architectures for the task of hyperspectral image analysis. The first dataset, AeroRIT, is a moving camera static scene captured from a flight and contains per pixel labeling across five categories for the task of semantic segmentation. The second dataset, RooftopHSI, helps design and interpret learnt features on hyperspectral object detection on scenes captured from an university rooftop. This dataset accounts for static camera, moving scene hyperspectral imagery. We further broaden the scope of our understanding of neural networks with the development of two novel algorithms - S4AL and S4AL+. We develop these frameworks on natural (color) imagery, by combining semi-supervised learning and active learning, and display promising results for learning with limited amount of labeled data, which can be extended to hyperspectral imagery. In this dissertation, we curated two new datasets for hyperspectral image analysis, significantly larger than existing datasets and broader in terms of categories for classification. We then adapt existing neural network architectures to function on the increased channel information, in a smart manner, to leverage all hyperspectral information. We also develop novel active learning algorithms on natural (color) imagery, and discuss the hope for expanding their functionality to hyperspectral imagery
    corecore