86 research outputs found

    SELF-ADAPTING PARALLEL FRAMEWORK FOR LONG-TERM OBJECT TRACKING

    Get PDF
    Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, so that object’s absence problem (in case of occlusion, cluttering or blurring) is resolved. Most of these algorithms have high computational burden on the computational units and need powerful CPUs to attain real-time tracking and high bitrate video processing. These computational units may handle no more than a single video source, making it unsuitable for large-scale implementations like multiple sources or higher resolution videos. In this thesis, we choose one popular algorithm called TLD, Tracking-Learning-Detection, study the core components of the algorithm that impede its performance, and implement these components in a parallel computational environment such as multi-core CPUs, GPUs, etc., also known as heterogeneous computing. OpenCL is used as a development platform to produce parallel kernels for the algorithm. The goals are to create an acceptable heterogeneous computing environment through utilizing current computer technologies, to imbue real-time applications with an alternative implementation methodology, and to circumvent the upcoming limitations of hardware in terms of cost, power, and speedup. We are able to bring true parallel speedup to the existing implementations, which greatly improves the frame rate for long-term object tracking and with some algorithm parameter modification, it provides more accurate object tracking. According to the experiments, developed kernels have achieved a range of performance improvement. As for reduction based kernels, a maximum of 78X speedup is achieved. While for window based kernels, a range of couple hundreds to 2000X speedup is achieved. And for the optical flow tracking kernel, a maximum of 5.7X speedup is recorded. Global speedup is highly dependent on the hardware specifications, especially for memory transfers. With the use of a medium sized input, the self-adapting parallel framework has successfully obtained a fast learning curve and converged to an average of 1.6X speedup compared to the original implementation. Lastly, for future programming convenience, an OpenCL based library is built to facilitate the use of OpenCL programming on parallel hardware devices, hide the complexity of building and compiling OpenCL kernels, and provide a C-based latency measurement tool that is compatible with several operating systems

    Multi-Template Temporal Siamese Network for Long-Term Object Tracking

    Full text link
    Siamese Networks are one of most popular visual object tracking methods for their high speed and high accuracy tracking ability as long as the target is well identified. However, most Siamese Network based trackers use the first frame as the ground truth of an object and fail when target appearance changes significantly in next frames. They also have dif iculty distinguishing the target from similar other objects in the frame. We propose two ideas to solve both problems. The first idea is using a bag of dynamic templates, containing diverse, similar, and recent target features and continuously updating it with diverse target appearances. The other idea is to let a network learn the path history and project a potential future target location in a next frame. This tracker achieves state-of-the-art performance on the long-term tracking dataset UAV20L by improving the success rate by a large margin of 15% (65.4 vs 56.6) compared to the state-of-the-art method, HiFT. The of icial python code of this paper is publicly available

    Long-term object tracking using region proposals

    Get PDF
    In this thesis we address the problem of tracking an arbitrary object in a sequence of images. We propose a long-term tracker based on the use of Siamese convolutional neural networks. For detection, we use a template with which we compute cross correlation on every point of the search image to find the best matching region. The template is initialized on the first frame, where we crop the image so that it represents only the tracking object and input it to the convolutional neural network. After each localization the tracker detects if tracking has failed. We propose two online methods of updating the visual model. One updates the template and the other fine tunes the parameters of the network. We carried out two analysis, where we measure long-term tracking performance on dataset LTB35 on modifications of our tracker. With the first analysis we find out what is a good setting for generating region proposals. The purpose of the second analysis is to test the proposed methods for updating the visual model. We find out that without updating the visual model, our tracker achieves F-measure of 0.34, when updating the template 0.22, when fine tuning 0.38 and with both methods we get 0.20. Finally we compared the performance of our tracker with the trackers submitted in the VOT-LT2018 challange, and achieved 11th place when fine tuning and 12th without fine tuning or updating the template

    Long-term object tracking using region proposals

    Get PDF
    In this thesis we address the problem of tracking an arbitrary object in a sequence of images. We propose a long-term tracker based on the use of Siamese convolutional neural networks. For detection, we use a template with which we compute cross correlation on every point of the search image to find the best matching region. The template is initialized on the first frame, where we crop the image so that it represents only the tracking object and input it to the convolutional neural network. After each localization the tracker detects if tracking has failed. We propose two online methods of updating the visual model. One updates the template and the other fine tunes the parameters of the network. We carried out two analysis, where we measure long-term tracking performance on dataset LTB35 on modifications of our tracker. With the first analysis we find out what is a good setting for generating region proposals. The purpose of the second analysis is to test the proposed methods for updating the visual model. We find out that without updating the visual model, our tracker achieves F-measure of 0.34, when updating the template 0.22, when fine tuning 0.38 and with both methods we get 0.20. Finally we compared the performance of our tracker with the trackers submitted in the VOT-LT2018 challange, and achieved 11th place when fine tuning and 12th without fine tuning or updating the template

    DART: Distribution Aware Retinal Transform for Event-based Cameras

    Full text link
    We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-features classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101). (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) For overcoming the low-sample problem for the one-shot learning of a binary classifier, statistical bootstrapping is leveraged with online learning; (ii) To achieve tracker robustness, the scale and rotation equivariance property of the DART descriptors is exploited for the one-shot learning. (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset. (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201

    Visual motion tracking and sensor fusion for kite power systems

    Get PDF
    An estimation approach is presented for kite power systems with groundbased actuation and generation. Line-based estimation of the kite state, including position and heading, limits the achievable cycle efficiency of such airborne wind energy systems due to significant estimation delay and line sag. We propose a filtering scheme to fuse onboard inertial measurements with ground-based line data for ground-based systems in pumping operation. Estimates are computed using an extended Kalman filtering scheme with a sensor-driven kinematic process model which propagates and corrects for inertial sensor biases. We further propose a visual motion tracking approach to extract estimates of the kite position from ground-based video streams. The approach combines accurate object detection with fast motion tracking to ensure long-term object tracking in real time. We present experimental results of the visual motion tracking and inertial sensor fusion on a ground-based kite power system in pumping operation and compare both methods to an existing estimation scheme based on line measurements

    Memory Based Online Learning of Deep Representations from Video Streams

    Full text link
    We present a novel online unsupervised method for face identity learning from video streams. The method exploits deep face descriptors together with a memory based learning mechanism that takes advantage of the temporal coherence of visual data. Specifically, we introduce a discriminative feature matching solution based on Reverse Nearest Neighbour and a feature forgetting strategy that detect redundant features and discard them appropriately while time progresses. It is shown that the proposed learning procedure is asymptotically stable and can be effectively used in relevant applications like multiple face identification and tracking from unconstrained video streams. Experimental results show that the proposed method achieves comparable results in the task of multiple face tracking and better performance in face identification with offline approaches exploiting future information. Code will be publicly available.Comment: arXiv admin note: text overlap with arXiv:1708.0361

    Surveillance with UAV Videos

    Get PDF
    Unmanned aerial vehicles (UAVs) and drones are now accessible to everyone and are widely used in civilian and military fields. In military applications, UAVs can be used in border surveillance to detect or track any moving object/target. The challenge of processing UAV images is the unpredictable background motions due to camera movement and small target sizes. In this chapter, a short literature brief will be discussed for moving object detection and long-term object tracking. Publicly available datasets in the literature are introduced. General approaches and success rates in the proposed methods are evaluated and approach to how deep learning-based solutions can be used together with classical methods are discussed. In addition to the methods in the literature for moving object detection problems, possible solution approaches for the challenges are also shared

    e-TLD: Event-based Framework for Dynamic Object Tracking

    Full text link
    This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions. A first of its kind for these revolutionary cameras, the tracking framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view. One of the key novelties is the use of an event-based local sliding window technique that tracks reliably in scenes with cluttered and textured background. In addition, Bayesian bootstrapping is used to assist real-time processing and boost the discriminative power of the object representation. On the other hand, when the object re-enters the field-of-view of the camera, a data-driven, global sliding window detector locates the object for subsequent tracking. Extensive experiments demonstrate the ability of the proposed framework to track and detect arbitrary objects of various shapes and sizes, including dynamic objects such as a human. This is a significant improvement compared to earlier works that simply track objects as long as they are visible under simpler background settings. Using the ground truth locations for five different objects under three motion settings, namely translation, rotation and 6-DOF, quantitative measurement is reported for the event-based tracking framework with critical insights on various performance issues. Finally, real-time implementation in C++ highlights tracking ability under scale, rotation, view-point and occlusion scenarios in a lab setting.Comment: 11 pages, 10 figure

    In Defense of Clip-based Video Relation Detection

    Full text link
    Video Visual Relation Detection (VidVRD) aims to detect visual relationship triplets in videos using spatial bounding boxes and temporal boundaries. Existing VidVRD methods can be broadly categorized into bottom-up and top-down paradigms, depending on their approach to classifying relations. Bottom-up methods follow a clip-based approach where they classify relations of short clip tubelet pairs and then merge them into long video relations. On the other hand, top-down methods directly classify long video tubelet pairs. While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets. This motivates us to revisit the clip-based paradigm and explore the key success factors in VidVRD. In this paper, we propose a Hierarchical Context Model (HCM) that enriches the object-based spatial context and relation-based temporal context based on clips. We demonstrate that using clip tubelets can achieve superior performance compared to most video-based methods. Additionally, using clip tubelets offers more flexibility in model designs and helps alleviate the limitations associated with video tubelets, such as the challenging long-term object tracking problem and the loss of temporal information in long-term tubelet feature compression. Extensive experiments conducted on two challenging VidVRD benchmarks validate that our HCM achieves a new state-of-the-art performance, highlighting the effectiveness of incorporating advanced spatial and temporal context modeling within the clip-based paradigm
    corecore