9,907 research outputs found

    Fusion of Head and Full-Body Detectors for Multi-Object Tracking

    Full text link
    In order to track all persons in a scene, the tracking-by-detection paradigm has proven to be a very effective approach. Yet, relying solely on a single detector is also a major limitation, as useful image information might be ignored. Consequently, this work demonstrates how to fuse two detectors into a tracking system. To obtain the trajectories, we propose to formulate tracking as a weighted graph labeling problem, resulting in a binary quadratic program. As such problems are NP-hard, the solution can only be approximated. Based on the Frank-Wolfe algorithm, we present a new solver that is crucial to handle such difficult problems. Evaluation on pedestrian tracking is provided for multiple scenarios, showing superior results over single detector tracking and standard QP-solvers. Finally, our tracker ranks 2nd on the MOT16 benchmark and 1st on the new MOT17 benchmark, outperforming over 90 trackers.Comment: 10 pages, 4 figures; Winner of the MOT17 challenge; CVPRW 201

    Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking

    Get PDF
    With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches

    Autonomous real-time surveillance system with distributed IP cameras

    Get PDF
    An autonomous Internet Protocol (IP) camera based object tracking and behaviour identification system, capable of running in real-time on an embedded system with limited memory and processing power is presented in this paper. The main contribution of this work is the integration of processor intensive image processing algorithms on an embedded platform capable of running at real-time for monitoring the behaviour of pedestrians. The Algorithm Based Object Recognition and Tracking (ABORAT) system architecture presented here was developed on an Intel PXA270-based development board clocked at 520 MHz. The platform was connected to a commercial stationary IP-based camera in a remote monitoring station for intelligent image processing. The system is capable of detecting moving objects and their shadows in a complex environment with varying lighting intensity and moving foliage. Objects moving close to each other are also detected to extract their trajectories which are then fed into an unsupervised neural network for autonomous classification. The novel intelligent video system presented is also capable of performing simple analytic functions such as tracking and generating alerts when objects enter/leave regions or cross tripwires superimposed on live video by the operator
    corecore