449 research outputs found

    Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

    Get PDF
    A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.Comment: PhD Dissertation (162 pages

    Deep learning techniques for visual object tracking

    Get PDF
    Visual object tracking plays a crucial role in various vision systems, including biometric analysis, medical imaging, smart traffic systems, and video surveillance. Despite notable advancements in visual object tracking over the past few decades, many tracking algorithms still face challenges due to factors like illumination changes, deformation, and scale variations. This thesis is divided into three parts. The first part introduces the visual object tracking problem and discusses the traditional approaches that have been used to study it. We then propose a novel method called Tracking by Iterative Multi-Refinements, which addresses the issue of locating the target by redefining the search for the ideal bounding box. This method utilizes an iterative process to forecast a sequence of bounding box adjustments, enabling the tracking algorithm to handle multiple non-conflicting transformations simultaneously. As a result, it achieves faster tracking and can handle a higher number of composite transformations. In the second part of this thesis we explore the application of reinforcement learning (RL) to visual tracking. Presenting a general RL framework applicable to problems that require a sequence of decisions. We discuss various families of popular RL approaches, including value-based methods, policy gradient approaches, and Actor-Critic Methods. Furthermore, we delve into the application of RL to visual tracking, where an RL agent predicts the target's location, selects hyperparameters, correlation filters, or target appearance. A comprehensive comparison of these approaches is provided, along with a taxonomy of state-of-the-art methods. The third part presents a novel method that addresses the need for online tuning of offline-trained tracking models. Typically, offline-trained models, whether through supervised learning or reinforcement learning, require additional tuning during online tracking to achieve optimal performance. The duration of this tuning process depends on the number of layers that need training for the new target. However, our thesis proposes a pioneering approach that expedites the training of convolutional neural networks (CNNs) while preserving their high performance levels. In summary, this thesis extensively explores the area of visual object tracking and its related domains, covering traditional approaches, novel methodologies like Tracking by Iterative Multi-Refinements, the application of reinforcement learning, and a pioneering method for accelerating CNN training. By addressing the challenges faced by existing tracking algorithms, this research aims to advance the field of visual object tracking and contributes to the development of more robust and efficient tracking systems

    Izboljšan vizualni model za sledenje s segmentacijo

    Full text link
    As if in response to the increased focus of the field on visual object tracking and video object segmentation, this work features several trackers escalating the associations between the two disciplines. These trackers, in particular, build upon an existing D3S tracker that has the capacity to produce both highly-reliable localization as well as an accurate segmentation of the target. Furthermore, said products are used in future target state inference to inform the process and achieve excellent tracking performance. In recognition of the benefits reaped by involving segmentation in visual object tracking, this work proposes several trackers in an effort to further both the accuracy and robustness of the D3S, as well as to improve its speed of inference. Novel trackers are compounded from existing components of the D3S implementation along with other constituents giving prominence to the latest advancements in the field. Namely, the two backbones of the original implementation are merged into a single backbone, CARAFE modules are instated to replace the bilinear upsampling stages, Octave convolution is introduced to improve the speed of feature extraction and the attention mechanism is implemented to incorporate contextual information into the tracking process. Alongside this, the lack of dataset diversity inspires a synthetic dataset to be constructed and used in pre-training stages of representation learning. Finally, the suitability of proposed tracking architectures is determined through rigorous evaluation.Magistrsko delo obravnava obojestranske koristi med vizualnim sledenjem objektov in segmentacijo objektov v videoposnetkih. Plodovi te obravnave so sledilniki, ki temeljijo na obstoječi metodi sledenja D3S. Poleg visoko zanesljive lokalizacije je sledilnik D3S zmožen tudi natančne segmentacije sledenega objekta, kar dodatno prispeva k uspešnosti metode. To dejstvo tesneje povezuje pričujoči disciplini računalniškega vida. Skozi vsebino dela se koristi, ki jih prinaša segmentacija v sožitju z vizualnim sledenjem objektov, kažejo v več predlaganih sledilniških arhitekturah. Te arhitekture v prizadevanju za izboljšanje natančnosti in robustnosti metode D3S proces sledenja nadgrajujejo ter bogatijo z novimi informacijami. Ena izmed predlaganih arhitektur, na primer, združuje enaki, vendar prvotno ločeni ogrodji omrežja v eno samo v prid hitrosti sledenja. Spet druga vpeljuje operatorje CARAFE na mestih bilinearne interpolacije, in sicer z namenom vključitve informacij širšega konteksta v vzorčenje značilk. Iz enakih razlogov je v tretji arhitekturi dodan mehanizem pozornosti. Poleg novih arhitektur delo obsega tudi konstrukcijo sintetičnega nabora podatkov, navdih čemur so pomanjkljivosti obstoječih zbirk podatkov. Delo se zaključi z eksperimentalno analizo kot merilom uspešnosti in ustreznosti predlaganih metod ter krajšo razpravo

    Fast left ventricle tracking using localized anatomical affine optical flow

    Get PDF
    Fast left ventricle tracking using localized anatomical affine optical flowIn daily clinical cardiology practice, left ventricle (LV) global and regional function assessment is crucial for disease diagnosis, therapy selection, and patient follow-up. Currently, this is still a time-consuming task, spending valuable human resources. In this work, a novel fast methodology for automatic LV tracking is proposed based on localized anatomically constrained affine optical flow. This novel method can be combined to previously proposed segmentation frameworks or manually delineated surfaces at an initial frame to obtain fully delineated datasets and, thus, assess both global and regional myocardial function. Its feasibility and accuracy were investigated in 3 distinct public databases, namely in realistically simulated 3D ultrasound, clinical 3D echocardiography, and clinical cine cardiac magnetic resonance images. The method showed accurate tracking results in all databases, proving its applicability and accuracy for myocardial function assessment. Moreover, when combined to previous state-of-the-art segmentation frameworks, it outperformed previous tracking strategies in both 3D ultrasound and cardiac magnetic resonance data, automatically computing relevant cardiac indices with smaller biases and narrower limits of agreement compared to reference indices. Simultaneously, the proposed localized tracking method showed to be suitable for online processing, even for 3D motion assessment. Importantly, although here evaluated for LV tracking only, this novel methodology is applicable for tracking of other target structures with minimal adaptations.The authors acknowledge funding support from FCT - Fundacao para a Ciência e a Tecnologia, Portugal, and the European Social Found, European Union, through the Programa Operacional Capital Humano (POCH) in the scope of the PhD grants SFRH/BD/93443/2013 (S. Queiros) and SFRH/BD/95438/2013 (P. Morais), and by the project ’PersonalizedNOS (01-0145-FEDER-000013)’ co-funded by Programa Operacional Regional do Norte (Norte2020) through the European Regional Development Fund (ERDF).info:eu-repo/semantics/publishedVersio

    KOLAM : human computer interfaces fro visual analytics in big data imagery

    Get PDF
    In the present day, we are faced with a deluge of disparate and dynamic information from multiple heterogeneous sources. Among these are the big data imagery datasets that are rapidly being generated via mature acquisition methods in the geospatial, surveillance (specifically, Wide Area Motion Imagery or WAMI) and biomedical domains. The need to interactively visualize these imagery datasets by using multiple types of views (as needed) into the data is common to these domains. Furthermore, researchers in each domain have additional needs: users of WAMI datasets also need to interactively track objects of interest using algorithms of their choice, visualize the resulting object trajectories and interactively edit these results as needed. While software tools that fulfill each of these requirements individually are available and well-used at present, there is still a need for tools that can combine the desired aspects of visualization, human computer interaction (HCI), data analysis, data management, and (geo-)spatial and temporal data processing into a single flexible and extensible system. KOLAM is an open, cross-platform, interoperable, scalable and extensible framework for visualization and analysis that we have developed to fulfil the above needs. The novel contributions in this thesis are the following: 1) Spatio-temporal caching for animating both giga-pixel and Full Motion Video (FMV) imagery, 2) Human computer interfaces purposefully designed to accommodate big data visualization, 3) Human-in-the-loop interactive video object tracking - ground-truthing of moving objects in wide area imagery using algorithm assisted human-in-the-loop coupled tracking, 4) Coordinated visualization using stacked layers, side-by-side layers/video sub-windows and embedded imagery, 5) Efficient one-click manual tracking, editing and data management of trajectories, 6) Efficient labeling of image segmentation regions and passing these results to desired modules, 7) Visualization of image processing results generated by non-interactive operators using layers, 8) Extension of interactive imagery and trajectory visualization to multi-monitor wall display environments, 9) Geospatial applications: Providing rapid roam, zoom and hyper-jump spatial operations, interactive blending, colormap and histogram enhancement, spherical projection and terrain maps, 10) Biomedical applications: Visualization and target tracking of cell motility in time-lapse cell imagery, collecting ground-truth from experts on whole-slide imagery (WSI) for developing histopathology analytic algorithms and computer-aided diagnosis for cancer grading, and easy-to-use tissue annotation features.Includes bibliographical reference

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    LEARNING TO ADAPT FROM FEW EXAMPLES

    Get PDF
    Despite huge progress in artificial intelligence, the ability to quickly learn from few examples is still far short of that of a human. With the goal of building machines with this capability, learning-to-learn or meta-learning has begun to emerge with promising results. I present the effectiveness and techniques that improve existing meta-learning methods in the context of visual object tracking, few-shot classification, and few-shot reinforcement learning setup. The visual object trackers that use online adaptation are improved. The core contribution is an offline meta-learning-based method to adjust the initial deep networks used in online adaptation-based tracking. The meta learning is driven by the goal of deep networks that can quickly be adapted to robustly model a particular target in future frames. Ideally the resulting models focus on features that are useful for future frames, and avoid overfitting to background clutter, small parts of the target, or noise. Experimental results on standard benchmarks, OTB2015 and VOT2016, show that the meta-learned trackers improve speed, accuracy, and robustness. It is observed that learning curvature information can achieve better generalization and fast model adaptation. Based on the model-agnostic meta-learner (MAML), learning to transform the gradients in the inner optimization such that the transformed gradients achieve better generalization performance to a new task. For training large scale neural networks, the decomposition of the curvature matrix into smaller matrices are proposed and this capture the dependencies of the model's parameters with a series of tensor products. Experimental results show significant improvements on classification tasks and promising results on reinforcement learning tasks. Finally, an analysis that explains better generalization performance with the meta-trained curvature is presented. Generative models given few examples are explored in the context of novel 3D view synthesis---given a single view of an object in an arbitrary pose, the goal is to synthesize an image of the object after a specified transformation of viewpoint. Instead of taking a `blank slate' approach, information presented in an input image is used. First, the parts of the geometry visible both in the input and novel views are explicitly inferred, and then the remaining synthesis problem becomes image completion task. In addition to the new network structure, training with a combination of adversarial and perceptual loss results in a reduction in common artifacts of novel view synthesis such as distortions and holes, while successfully generating high frequency details and preserving visual aspects of the input image. Both qualitative and quantitative results show the proposed method achieves significantly better results compared to existing methods.Doctor of Philosoph
    corecore