673 research outputs found

    Novel data association methods for online multiple human tracking

    Get PDF
    PhD ThesisVideo-based multiple human tracking has played a crucial role in many applications such as intelligent video surveillance, human behavior analysis, and health-care systems. The detection based tracking framework has become the dominant paradigm in this research eld, and the major task is to accurately perform the data association between detections across the frames. However, online multiple human tracking, which merely relies on the detections given up to the present time for the data association, becomes more challenging with noisy detections, missed detections, and occlusions. To address these challenging problems, there are three novel data association methods for online multiple human tracking are presented in this thesis, which are online group-structured dictionary learning, enhanced detection reliability and multi-level cooperative fusion. The rst proposed method aims to address the noisy detections and occlusions. In this method, sequential Monte Carlo probability hypothesis density (SMC-PHD) ltering is the core element for accomplishing the tracking task, where the measurements are produced by the detection based tracking framework. To enhance the measurement model, a novel adaptive gating strategy is developed to aid the classi cation of measurements. In addition, online group-structured dictionary learning with a maximum voting method is proposed to estimate robustly the target birth intensity. It enables the new-born targets in the tracking process to be accurately initialized from noisy sensor measurements. To improve the adaptability of the group-structured dictionary to target appearance changes, the simultaneous codeword optimization (SimCO) algorithm is employed for the dictionary update. The second proposed method relates to accurate measurement selection of detections, which is further to re ne the noisy detections prior to the tracking pipeline. In order to achieve more reliable measurements in the Gaussian mixture (GM)-PHD ltering process, a global-to-local enhanced con dence rescoring strategy is proposed by exploiting the classi cation power of a mask region-convolutional neural network (R-CNN). Then, an improved pruning algorithm namely soft-aggregated non-maximal suppression (Soft-ANMS) is devised to further enhance the selection step. In addition, to avoid the misuse of ambiguous measurements in the tracking process, person re-identi cation (ReID) features driven by convolutional neural networks (CNNs) are integrated to model the target appearances. The third proposed method focuses on addressing the issues of missed detections and occlusions. This method integrates two human detectors with di erent characteristics (full-body and body-parts) in the GM-PHD lter, and investigates their complementary bene ts for tracking multiple targets. For each detector domain, a novel discriminative correlation matching (DCM) model for integration in the feature-level fusion is proposed, and together with spatio-temporal information is used to reduce the ambiguous identity associations in the GM-PHD lter. Moreover, a robust fusion center is proposed within the decision-level fusion to mitigate the sensitivity of missed detections in the fusion process, thereby improving the fusion performance and tracking consistency. The e ectiveness of these proposed methods are investigated using the MOTChallenge benchmark, which is a framework for the standardized evaluation of multiple object tracking methods. Detailed evaluations on challenging video datasets, as well as comparisons with recent state-of-the-art techniques, con rm the improved multiple human tracking performance

    Audio-visual tracking of concurrent speakers

    Get PDF
    Audio-visual tracking of an unknown number of concurrent speakers in 3D is a challenging task, especially when sound and video are collected with a compact sensing platform. In this paper, we propose a tracker that builds on generative and discriminative audio-visual likelihood models formulated in a particle filtering framework. We localize multiple concurrent speakers with a de-emphasized acoustic map assisted by the image detection-derived 3D video observations. The 3D multimodal observations are either assigned to existing tracks for discriminative likelihood computation or used to initialize new tracks. The generative likelihoods rely on color distribution of the target and the de-emphasized acoustic map value. Experiments on AV16.3 and CAV3D datasets show that the proposed tracker outperforms the uni-modal trackers and the state-of-the-art approaches both in 3D and on the image plane

    Single to multiple target, multiple type visual tracking

    Get PDF
    Visual tracking is a key task in applications such as intelligent surveillance, humancomputer interaction (HCI), human-robot interaction (HRI), augmented reality (AR), driver assistance systems, and medical applications. In this thesis, we make three main novel contributions for target tracking in video sequences. First, we develop a long-term model-free single target tracking by learning discriminative correlation filters and an online classifier that can track a target of interest in both sparse and crowded scenes. In this case, we learn two different correlation filters, translation and scale correlation filters, using different visual features. We also include a re-detection module that can re-initialize the tracker in case of tracking failures due to long-term occlusions. Second, a multiple target, multiple type filtering algorithm is developed using Random Finite Set (RFS) theory. In particular, we extend the standard Probability Hypothesis Density (PHD) filter for multiple type of targets, each with distinct detection properties, to develop multiple target, multiple type filtering, N-type PHD filter, where N ≥ 2, for handling confusions that can occur among target types at the measurements level. This method takes into account not only background false positives (clutter), but also confusions between target detections, which are in general different in character from background clutter. Then, under the assumptions of Gaussianity and linearity, we extend Gaussian mixture (GM) implementation of the standard PHD filter for the proposed N-type PHD filter termed as N-type GM-PHD filter. Third, we apply this N-type GM-PHD filter to real video sequences by integrating object detectors’ information into this filter for two scenarios. In the first scenario, a tri-GM-PHD filter is applied to real video sequences containing three types of multiple targets in the same scene, two football teams and a referee, using separate but confused detections. In the second scenario, we use a dual GM-PHD filter for tracking pedestrians and vehicles in the same scene handling their detectors’ confusions. For both cases, Munkres’s variant of the Hungarian assignment algorithm is used to associate tracked target identities between frames. We make extensive evaluations of these developed algorithms and find out that our methods outperform their corresponding state-of-the-art approaches by a large margin.EPSR

    MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

    Full text link
    Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data, and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light on potential future research directions.Comment: Accepted at IJC

    Multiple perspective object tracking via context-aware Correlation Filter

    Get PDF
    corecore