108,485 research outputs found
Video Tracking for Visual Degraded Aerial Vehicle with H-PMHT
The work presented in this paper describes a novel approach for automatic video tracking of visual degraded air vehicles in daylight with sky background. The offered and applied video object tracking method is based on Histogram Probabilistic Multi Hypothesis Tracker algorithm. The H-PMHT is an expectation maximization based algorithm developed for tracking objects in intense clutter environment by using intensity modulated data streams. Basically H-PMHT algorithm is suitable for linear-Gaussian point spread function case. However, recent studies have indicated that the algorithm is also applicable for non-linear and non-Gaussian target shapes. Thus H-PMHT becomes a suitable alternative for tracking applications with sonar, high resolution radars,IR, UV sensors and cameras. In this work H-PMHT algorithm is used for video tracking of visual degraded air vehicles. For this purpose RGB video data is processed by using a reciprocal pixel intensity measurement for meeting the requirements of the tracking process. A simulation study is conducted in order to demonstrate the video tracking performance of H-PMHT against visual degraded air vehicles. Also the results obtained with H-PMHT algorithm are compared with the results of amplitude information added Interacting Multi Model Probabilistic Data Association algorithm
Multi-level Map Construction for Dynamic Scenes
In dynamic scenes, both localization and mapping in visual SLAM face
significant challenges. In recent years, numerous outstanding research works
have proposed effective solutions for the localization problem. However, there
has been a scarcity of excellent works focusing on constructing long-term
consistent maps in dynamic scenes, which severely hampers map applications. To
address this issue, we have designed a multi-level map construction system
tailored for dynamic scenes. In this system, we employ multi-object tracking
algorithms, DBSCAN clustering algorithm, and depth information to rectify the
results of object detection, accurately extract static point clouds, and
construct dense point cloud maps and octree maps. We propose a plane map
construction algorithm specialized for dynamic scenes, involving the
extraction, filtering, data association, and fusion optimization of planes in
dynamic environments, thus creating a plane map. Additionally, we introduce an
object map construction algorithm targeted at dynamic scenes, which includes
object parameterization, data association, and update optimization. Extensive
experiments on public datasets and real-world scenarios validate the accuracy
of the multi-level maps constructed in this study and the robustness of the
proposed algorithms. Furthermore, we demonstrate the practical application
prospects of our algorithms by utilizing the constructed object maps for
dynamic object tracking
Dual L1-normalized context aware tensor power iteration and its applications to multi-object tracking and multi-graph matching
The multi-dimensional assignment problem is universal for data association analysis such as data association-based visual multi-object tracking and multi-graph matching. In this paper, multi-dimensional assignment is formulated as a rank-1 tensor approximation problem. A dual 1-normalized context/hyper-context aware tensor power iteration optimization method is proposed. The method is applied to multi-object tracking and multi-graph matching. In the optimization method, tensor power iteration with the dual unit norm enables the capture of information across multiple sample sets. Interactions between sample associations are modeled as contexts or hyper-contexts which are combined with the global affinity into a unified optimization. The optimization is flexible for accommodating various types of contextual models. In multi-object tracking, the global affinity is defined according to the appearance similarity between objects detected in different frames. Interactions between objects are modeled as motion contexts which are encoded into the global association optimization. The tracking method integrates high order motion information and high order appearance variation. The multi-graph matching method carries out matching over graph vertices and structure matching over graph edges simultaneously. The matching consistency across multi-graphs is based on the high-order tensor optimization. Various types of vertext affinites and edge/hyper-edge affinities are flexibly integrated. Experiments on several public datasets, such as the MOT16 challenge benchmark, validate the effectiveness of the proposed methods
Occlusion reasoning for multiple object visual tracking
Thesis (Ph.D.)--Boston UniversityOcclusion reasoning for visual object tracking in uncontrolled environments is a challenging problem. It becomes significantly more difficult when dense groups of indistinguishable objects are present in the scene that cause frequent inter-object interactions and occlusions. We present several practical solutions that tackle the inter-object occlusions for video surveillance applications.
In particular, this thesis proposes three methods. First, we propose "reconstruction-tracking," an online multi-camera spatial-temporal data association method for tracking large groups of objects imaged with low resolution. As a variant of the well-known Multiple-Hypothesis-Tracker, our approach localizes the positions of objects in 3D space with possibly occluded observations from multiple camera views and performs temporal data association in 3D. Second, we develop "track linking," a class of offline batch processing algorithms for long-term occlusions, where the decision has to be made based on the observations from the entire tracking sequence. We construct a graph representation to characterize occlusion events and propose an efficient graph-based/combinatorial algorithm to resolve occlusions.
Third, we propose a novel Bayesian framework where detection and data association are combined into a single module and solved jointly. Almost all traditional tracking systems address the detection and data association tasks separately in sequential order. Such a design implies that the output of the detector has to be reliable in order to make the data association work. Our framework takes advantage of the often complementary nature of the two subproblems, which not only avoids the error propagation issue from which traditional "detection-tracking approaches" suffer but also eschews common heuristics such as "nonmaximum suppression" of hypotheses by modeling the likelihood of the entire image.
The thesis describes a substantial number of experiments, involving challenging, notably distinct simulated and real data, including infrared and visible-light data sets recorded ourselves or taken from data sets publicly available. In these videos, the number of objects ranges from a dozen to a hundred per frame in both monocular and multiple views. The experiments demonstrate that our approaches achieve results comparable to those of state-of-the-art approaches
Multi-object Tracking from the Classics to the Modern
Visual object tracking is one of the computer vision problems that has been researched extensively over the past several decades. Many computer vision applications, such as robotics, autonomous driving, and video surveillance, require the capability to track multiple objects in videos. The most popular solution approach to tracking multiple objects follows the tracking-by-detection paradigm in which the problem of tracking is divided into object detection and data association. In data association, track proposals are often generated by extending the object tracks from the previous frame with new detections in the current frame. The association algorithm then utilizes a track scorer or classifier in evaluating track proposals in order to estimate the correspondence between the object detections and object tracks. The goal of this dissertation is to design a track scorer and classifier that accurately evaluates track proposals that are generated during the association step. In this dissertation, I present novel track scorers and track classifiers that make a prediction based on long-term object motion and appearance cues and demonstrate its effectiveness in tracking by utilizing them within existing data association frameworks. First, I present an online learning algorithm that can efficiently train a track scorer based on a long-term appearance model for the classical Multiple Hypothesis Tracking (MHT) framework. I show that the classical MHT framework achieves competitive tracking performance even in modern tracking settings in which strong object detector and strong appearance models are available. Second, I present a novel Bilinear LSTM model as a deep, long-term appearance model which is a basis for an end-to-end learned track classifier. The architectural design of Bilinear LSTM is inspired by insights drawn from the classical recursive least squares framework. I incorporate this track classifier into the classical MHT framework in order to demonstrate its effectiveness in object tracking. Third, I present a novel multi-track pooling module that enables the Bilinear LSTM-based track classifier to simultaneously consider all the objects in the scene in order to better handle appearance ambiguities between different objects. I utilize this track classifier in a simple, greedy data association algorithm and achieve real-time, state-of-the-art tracking performance. I evaluate the proposed methods in this dissertation on public multi-object tracking datasets that capture challenging object tracking scenarios in urban areas.Ph.D
Self-Supervised Multi-Object Tracking From Consistency Across Timescales
Self-supervised multi-object trackers have the potential to leverage the vast
amounts of raw data recorded worldwide. However, they still fall short in
re-identification accuracy compared to their supervised counterparts. We
hypothesize that this deficiency results from restricting self-supervised
objectives to single frames or frame pairs. Such designs lack sufficient visual
appearance variations during training to learn consistent re-identification
features. Therefore, we propose a training objective that learns
re-identification features over a sequence of frames by enforcing consistent
association scores across short and long timescales. Extensive evaluations on
the BDD100K and MOT17 benchmarks demonstrate that our learned ReID features
significantly reduce ID switches compared to other self-supervised methods,
setting the new state of the art for self-supervised multi-object tracking and
even performing on par with supervised methods on the BDD100k benchmark.Comment: 8 pages, 3 figures, 5 table
Robust and Efficient Inference of Scene and Object Motion in Multi-Camera Systems
Multi-camera systems have the ability to overcome some of the fundamental limitations of single camera based systems. Having multiple view points of a scene goes a long way in limiting the influence of field of view, occlusion, blur and poor resolution of an individual camera. This dissertation addresses robust and efficient inference of object motion and scene in multi-camera and multi-sensor systems.
The first part of the dissertation discusses the role of constraints introduced by projective imaging towards robust inference of multi-camera/sensor based object motion. We discuss the role of the homography and epipolar constraints for fusing object motion perceived by individual cameras. For planar scenes, the homography constraints provide a natural mechanism for data association. For scenes that are not planar, the epipolar constraint provides a weaker multi-view relationship. We use the epipolar constraint for tracking in multi-camera and multi-sensor networks. In particular, we show that the epipolar constraint reduces the dimensionality of the state space of the
problem by introducing a ``shared'' state space for the joint tracking problem. This allows for robust tracking even when one of the sensors fail due to poor SNR or occlusion.
The second part of the dissertation deals with challenges in the computational aspects of tracking algorithms that are common to such systems. Much of the inference in the multi-camera and multi-sensor networks deal with complex non-linear models corrupted with non-Gaussian noise. Particle filters provide approximate Bayesian inference in such settings. We analyze the computational drawbacks of traditional particle filtering algorithms, and present a method for implementing the particle filter using the Independent Metropolis Hastings sampler, that is highly amenable to pipelined implementations and parallelization. We analyze the implementations of the proposed algorithm, and in particular concentrate on implementations that have
minimum processing times.
The last part of the dissertation deals with the efficient sensing paradigm of compressing sensing (CS) applied to signals in imaging, such as natural images and reflectance fields. We propose a hybrid signal model on the assumption that most real-world signals exhibit subspace compressibility as well as sparse representations. We show that several real-world visual signals such as images, reflectance fields, videos etc., are better approximated by this hybrid of two models. We derive optimal hybrid linear projections of the signal and show that theoretical guarantees and algorithms designed for CS can be easily extended to hybrid subspace-compressive sensing. Such methods reduce the
amount of information sensed by a camera, and help in reducing the so called data deluge problem in large multi-camera systems
Pixel-Level Deep Multi-Dimensional Embeddings for Homogeneous Multiple Object Tracking
The goal of Multiple Object Tracking (MOT) is to locate multiple objects and keep track of their individual identities and trajectories given a sequence of (video) frames. A popular approach to MOT is tracking by detection consisting of two processing components: detection (identification of objects of interest in individual frames) and data association (connecting data from multiple frames). This work addresses the detection component by introducing a method based on semantic instance segmentation, i.e., assigning labels to all visible pixels such that they are unique among different instances. Modern tracking methods often built around Convolutional Neural Networks (CNNs) and additional, explicitly-defined post-processing steps.
This work introduces two detection methods that incorporate multi-dimensional embeddings. We train deep CNNs to produce easily-clusterable embeddings for semantic instance segmentation and to enable object detection through pose estimation. The use of embeddings allows the method to identify per-pixel instance membership for both tasks.
Our method specifically targets applications that require long-term tracking of homogeneous targets using a stationary camera. Furthermore, this method was developed and evaluated on a livestock tracking application which presents exceptional challenges that generalized tracking methods are not equipped to solve. This is largely because contemporary datasets for multiple object tracking lack properties that are specific to livestock environments. These include a high degree of visual similarity between targets, complex physical interactions, long-term inter-object occlusions, and a fixed-cardinality set of targets.
For the reasons stated above, our method is developed and tested with the livestock application in mind and, specifically, group-housed pigs are evaluated in this work. Our method reliably detects pigs in a group housed environment based on the publicly available dataset with 99% precision and 95% using pose estimation and achieves 80% accuracy when using semantic instance segmentation at 50% IoU threshold.
Results demonstrate our method\u27s ability to achieve consistent identification and tracking of group-housed livestock, even in cases where the targets are occluded and despite the fact that they lack uniquely identifying features. The pixel-level embeddings used by the proposed method are thoroughly evaluated in order to demonstrate their properties and behaviors when applied to real data.
Adivser: Lance C. Pére
- …