24,423 research outputs found

    Predicting Multiple Target Tracking Performance for Applications on Video Sequences

    Get PDF
    This dissertation presents a framework to predict the performance of multiple target tracking (MTT) techniques. The framework is based on the mathematical descriptors of point processes, the probability generating functional (p.g.fl). It is shown that conceptually the p.g.fls of MTT techniques can be interpreted as a transform that can be marginalized to an expression that encodes all the information regarding the likelihood model as well as the underlying assumptions present in a given tracking technique. In order to use this approach for tracker performance prediction in video sequences, a framework that combines video quality assessment concepts and the marginalized transform is introduced. The multiple hypothesis tracker (MHT), Joint Probabilistic Data Association (JPDA), Markov Chain Monte Carlo (MCMC) data association, and the Probability Hypothesis Density filter (PHD) are used as a test cases. We introduce their transforms and perform a numerical comparison to predict their performance under identical conditions. We also introduce the concepts that present the base for estimation in general and for applications in computer vision

    Predicting Multiple Target Tracking Performance for Applications on Video Sequences

    Get PDF
    This dissertation presents a framework to predict the performance of multiple target tracking (MTT) techniques. The framework is based on the mathematical descriptors of point processes, the probability generating functional (p.g.fl). It is shown that conceptually the p.g.fls of MTT techniques can be interpreted as a transform that can be marginalized to an expression that encodes all the information regarding the likelihood model as well as the underlying assumptions present in a given tracking technique. In order to use this approach for tracker performance prediction in video sequences, a framework that combines video quality assessment concepts and the marginalized transform is introduced. The multiple hypothesis tracker (MHT), Joint Probabilistic Data Association (JPDA), Markov Chain Monte Carlo (MCMC) data association, and the Probability Hypothesis Density filter (PHD) are used as a test cases. We introduce their transforms and perform a numerical comparison to predict their performance under identical conditions. We also introduce the concepts that present the base for estimation in general and for applications in computer vision

    A Deep-structured Conditional Random Field Model for Object Silhouette Tracking

    Full text link
    In this work, we introduce a deep-structured conditional random field (DS-CRF) model for the purpose of state-based object silhouette tracking. The proposed DS-CRF model consists of a series of state layers, where each state layer spatially characterizes the object silhouette at a particular point in time. The interactions between adjacent state layers are established by inter-layer connectivity dynamically determined based on inter-frame optical flow. By incorporate both spatial and temporal context in a dynamic fashion within such a deep-structured probabilistic graphical model, the proposed DS-CRF model allows us to develop a framework that can accurately and efficiently track object silhouettes that can change greatly over time, as well as under different situations such as occlusion and multiple targets within the scene. Experiment results using video surveillance datasets containing different scenarios such as occlusion and multiple targets showed that the proposed DS-CRF approach provides strong object silhouette tracking performance when compared to baseline methods such as mean-shift tracking, as well as state-of-the-art methods such as context tracking and boosted particle filtering.Comment: 17 page

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking

    Full text link
    Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 201

    Long-Term Visual Object Tracking Benchmark

    Full text link
    We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for single object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking. The proposed dataset paves a way to suitably assess long term tracking performance and train better deep learning architectures (avoiding/reducing augmentation, which may not reflect real world behaviour). We benchmark the dataset on 17 state of the art trackers and rank them according to tracking accuracy and run time speeds. We further present thorough qualitative and quantitative evaluation highlighting the importance of long term aspect of tracking. Our most interesting observations are (a) existing short sequence benchmarks fail to bring out the inherent differences in tracking algorithms which widen up while tracking on long sequences and (b) the accuracy of trackers abruptly drops on challenging long sequences, suggesting the potential need of research efforts in the direction of long-term tracking.Comment: ACCV 2018 (Oral

    Survey on Vision-based Path Prediction

    Full text link
    Path prediction is a fundamental task for estimating how pedestrians or vehicles are going to move in a scene. Because path prediction as a task of computer vision uses video as input, various information used for prediction, such as the environment surrounding the target and the internal state of the target, need to be estimated from the video in addition to predicting paths. Many prediction approaches that include understanding the environment and the internal state have been proposed. In this survey, we systematically summarize methods of path prediction that take video as input and and extract features from the video. Moreover, we introduce datasets used to evaluate path prediction methods quantitatively.Comment: DAPI 201

    Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets

    Full text link
    In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedestrian interaction, are often combined with tracklets. In this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between positions and head orientations (vislets) thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. We additionally exploit the head orientations as a proxy for the visual attention, when modeling social interactions. MX-LSTM predicts future pedestrians location and head pose, increasing the standard capabilities of the current approaches on long-term trajectory forecasting. Compared to the state-of-the-art, our approach shows better performances on an extensive set of public benchmarks. MX-LSTM is particularly effective when people move slowly, i.e. the most challenging scenario for all other models. The proposed approach also allows for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065
    corecore