33,626 research outputs found

    Generative Adversarial Networks for Online Visual Object Tracking Systems

    Get PDF
    Object Tracking is one of the essential tasks in computer vision domain as it has numerous applications in various fields, such as human-computer interaction, video surveillance, augmented reality, and robotics. Object Tracking refers to the process of detecting and locating the target object in a series of frames in a video. The state-of-the-art for tracking-by-detection framework is typically made up of two steps to track the target object. The first step is drawing multiple samples near the target region of the previous frame. The second step is classifying each sample as either the target object or the background. Visual object tracking remains one of the most challenging task due to variations in visual data such as target occlusion, background clutter, illumination changes, scale changes, as well as challenges stem from the tracking problem including fast motion, out of view, motion blur, deformation, and in and out planar rotation. These challenges continue to be tackled by researchers as they investigate more effective algorithms that are able to track any object under various changing conditions. To keep the research community motivated, there are several annual tracker benchmarking competitions organized to consolidate performance measures and evaluation protocols in different tracking subfields such as Visual Object Tracking VOT challenges and The Multiple Object Tracking MOT Challenges [1, 2]. Despite the excellent performance achieved with deep learning, modern deep tracking methods are still limited in several aspects. The variety of appearance changes over time remains a problem for deep trackers, owing to spatial overlap between positive samples. Furthermore, existing methods require high computational load and suffer from slow running speed. Recently, Generative Adversarial Networks (GANs) have shown excellent results in solving a variety of computer vision problems, making them attractive in investigating their potential use in achieving better results in other computer vision applications, namely, visual object tracking. In this thesis, we explore the impact of using Residual Network ResNet as an alternative feature extractor to Visual Geometry Group VGG which is commonly used in literature. Furthermore, we attempt to address the limitations of object tracking by exploiting the ongoing advancement in Generative Adversarial Networks. We describe a generative adversarial network intended to improve the tracker’s classifier during the online training phase. The network generates adaptive masks to augment the positive samples detected by the convolutional layer of the tracker’s model in order to improve the model’s classifier by making the samples more difficult. Then we integrate this network with Multi-Domain Convolutional Neural Network (MDNet) tracker and present the results. Furthermore, we introduce a novel tracker, MDResNet, by substituting the convolutional layers of MDNet that were originally taken from Visual Geometry Group (VGG-M) network with layers taken from Residual Deep Network (ResNet-50) and the results are compared. We also introduce a new tracker, Region of Interest with Adversarial Learning (ROIAL), by integrating the generative adversarial network with the Real-Time Multi-Domain Convolutional Network (RT-MDNet) tracker. We also integrate the GAN network with MDResNet and MDNet and compare the results with ROIAL

    Online Domain Adaptation for Multi-Object Tracking

    Full text link
    Automatically detecting, labeling, and tracking objects in videos depends first and foremost on accurate category-level object detectors. These might, however, not always be available in practice, as acquiring high-quality large scale labeled training datasets is either too costly or impractical for all possible real-world application scenarios. A scalable solution consists in re-using object detectors pre-trained on generic datasets. This work is the first to investigate the problem of on-line domain adaptation of object detectors for causal multi-object tracking (MOT). We propose to alleviate the dataset bias by adapting detectors from category to instances, and back: (i) we jointly learn all target models by adapting them from the pre-trained one, and (ii) we also adapt the pre-trained model on-line. We introduce an on-line multi-task learning algorithm to efficiently share parameters and reduce drift, while gradually improving recall. Our approach is applicable to any linear object detector, and we evaluate both cheap "mini-Fisher Vectors" and expensive "off-the-shelf" ConvNet features. We quantitatively measure the benefit of our domain adaptation strategy on the KITTI tracking benchmark and on a new dataset (PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.Comment: To appear at BMVC 201

    Detect to Track and Track to Detect

    Full text link
    Recent approaches for high accuracy detection and tracking of object categories in video consist of complex multistage solutions that become more cumbersome each year. In this paper we propose a ConvNet architecture that jointly performs detection and tracking, solving the task in a simple and effective way. Our contributions are threefold: (i) we set up a ConvNet architecture for simultaneous detection and tracking, using a multi-task objective for frame-based object detection and across-frame track regression; (ii) we introduce correlation features that represent object co-occurrences across time to aid the ConvNet during tracking; and (iii) we link the frame level detections based on our across-frame tracklets to produce high accuracy detections at the video level. Our ConvNet architecture for spatiotemporal object detection is evaluated on the large-scale ImageNet VID dataset where it achieves state-of-the-art results. Our approach provides better single model performance than the winning method of the last ImageNet challenge while being conceptually much simpler. Finally, we show that by increasing the temporal stride we can dramatically increase the tracker speed.Comment: ICCV 2017. Code and models: https://github.com/feichtenhofer/Detect-Track Results: https://www.robots.ox.ac.uk/~vgg/research/detect-track

    Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism

    Full text link
    In this paper, we propose a CNN-based framework for online MOT. This framework utilizes the merits of single object trackers in adapting appearance models and searching for target in the next frame. Simply applying single object tracker for MOT will encounter the problem in computational efficiency and drifted results caused by occlusion. Our framework achieves computational efficiency by sharing features and using ROI-Pooling to obtain individual features for each target. Some online learned target-specific CNN layers are used for adapting the appearance model for each target. In the framework, we introduce spatial-temporal attention mechanism (STAM) to handle the drift caused by occlusion and interaction among targets. The visibility map of the target is learned and used for inferring the spatial attention map. The spatial attention map is then applied to weight the features. Besides, the occlusion status can be estimated from the visibility map, which controls the online updating process via weighted loss on training samples with different occlusion statuses in different frames. It can be considered as temporal attention mechanism. The proposed algorithm achieves 34.3% and 46.0% in MOTA on challenging MOT15 and MOT16 benchmark dataset respectively.Comment: Accepted at International Conference on Computer Vision (ICCV) 201
    • …
    corecore