21 research outputs found
Self-correcting Bayesian target tracking
The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the authorAbstract
Visual tracking, a building block for many applications, has challenges such as occlusions,illumination changes, background clutter and variable motion dynamics that may degrade the
tracking performance and are likely to cause failures. In this thesis, we propose Track-Evaluate-Correct framework (self-correlation) for existing trackers in order to achieve a robust tracking.
For a tracker in the framework, we embed an evaluation block to check the status of tracking quality and a correction block to avoid upcoming failures or to recover from failures. We present a generic representation and formulation of the self-correcting tracking for Bayesian trackers using a Dynamic Bayesian Network (DBN). The self-correcting tracking is done similarly to a selfaware
system where parameters are tuned in the model or different models are fused or selected in a piece-wise way in order to deal with tracking challenges and failures. In the DBN model
representation, the parameter tuning, fusion and model selection are done based on evaluation and correction variables that correspond to the evaluation and correction, respectively. The inferences
of variables in the DBN model are used to explain the operation of self-correcting tracking. The specific contributions under the generic self-correcting framework are correlation-based selfcorrecting
tracking for an extended object with model points and tracker-level fusion as described below.
For improving the probabilistic tracking of extended object with a set of model points, we use Track-Evaluate-Correct framework in order to achieve self-correcting tracking. The framework
combines the tracker with an on-line performance measure and a correction technique. We correlate model point trajectories to improve on-line the accuracy of a failed or an uncertain tracker. A model point tracker gets assistance from neighbouring trackers whenever degradation in its
performance is detected using the on-line performance measure. The correction of the model point state is based on the correlation information from the states of other trackers. Partial Least
Square regression is used to model the correlation of point tracker states from short windowed trajectories adaptively. Experimental results on data obtained from optical motion capture systems show the improvement in tracking performance of the proposed framework compared to the
baseline tracker and other state-of-the-art trackers. The proposed framework allows appropriate re-initialisation of local trackers to recover from failures that are caused by clutter and missed
detections in the motion capture data.
Finally, we propose a tracker-level fusion framework to obtain self-correcting tracking. The
fusion framework combines trackers addressing different tracking challenges to improve the
overall performance. As a novelty of the proposed framework, we include an online performance measure to identify the track quality level of each tracker to guide the fusion. The trackers
in the framework assist each other based on appropriate mixing of the prior states. Moreover, the track quality level is used to update the target appearance model. We demonstrate the framework
with two Bayesian trackers on video sequences with various challenges and show its robustness
compared to the independent use of the trackers used in the framework, and also compared to
other state-of-the-art trackers. The appropriate online performance measure based appearance
model update and prior mixing on trackers allows the proposed framework to deal with tracking
challenges
3D Online Multi-Object Tracking for Autonomous Driving
This research work focuses on exploring a novel 3D multi-object tracking architecture: 'FANTrack: 3D Multi-Object Tracking with Feature Association Network' for autonomous driving, based on tracking by detection and online tracking strategies using deep learning architectures for data association. The problem of multi-target tracking aims to assign noisy detections to a-priori unknown and time-varying number of tracked objects across a sequence of frames. A majority of the existing solutions focus on either tediously designing cost functions or formulating the task of data association as a complex optimization problem that can be solved effectively. Instead, we exploit the power of deep learning to formulate the data association problem as inference in a CNN. To this end, we propose to learn a similarity function that combines cues from both image and spatial features of objects.
The proposed approach consists of a similarity network that predicts the similarity scores of the object pairs and builds a local similarity map. Another network formulates the data association problem as inference in a CNN by using the similarity scores and spatial information. The model learns to perform global assignments in 3D purely from data, handles noisy detections and a varying number of targets, and is easy to train.
Experiments on the challenging Kitti dataset show competitive results with the state of the art. The model is finally implemented in ROS and deployed on our autonomous vehicle to show the robustness and online tracking capabilities. The proposed tracker runs alongside the object detector utilizing the resources efficiently
Mathematical Models and Monte-Carlo Algorithms for Improved Detection of Targets in the Commercial Maritime Domain
Commercial Vessel Traffic Monitoring Services (VTMSs) are widely used by port authorities and the military to improve the safety and efficiency of navigation, as well as to ensure the security of ports and marine life as a whole. Technology based on the Kalman Filtering framework is in widespread use in modern operational VTMS systems. At a research level, there has also been a significant interest in Particle Filters, which are widely researched but far less widely applied to deliver an operational advantage. The Monte-Carlo nature of Particle Filters places them as the ideal candidate for solving the highly non-linear, non-Gaussian problems encountered by modern VTMS systems. However, somewhat counter-intuitively, while Particle Filters are best suited to exploit such non-linear, non-Gaussian problems, they are most frequently used within a context that is mostly linear and Gaussian. The engineering challenge tackled by the PhD project reported in this thesis was to study and experiment with models that are well placed to capitalise on the abilities of Particle Filters and to develop solutions that make use of such models to deliver a direct operational advantage in real applications within the commercial maritime domain
Cooperative multi-sensor tracking of vulnerable road users in the presence of missing detections
This paper presents a vulnerable road user (VRU) tracking algorithm capable of handling noisy and missing detections from heterogeneous sensors. We propose a cooperative fusion algorithm for matching and reinforcing of radar and camera detections using their proximity and positional uncertainty. The belief in the existence and position of objects is then maximized by temporal integration of fused detections by a multi-object tracker. By switching between observation models, the tracker adapts to the detection noise characteristics making it robust to individual sensor failures. The main novelty of this paper is an improved imputation sampling function for updating the state when detections are missing. The proposed function uses a likelihood without association that is conditioned on the sensor information instead of the sensor model. The benefits of the proposed solution are two-fold: firstly, particle updates become computationally tractable and secondly, the problem of imputing samples from a state which is predicted without an associated detection is bypassed. Experimental evaluation shows a significant improvement in both detection and tracking performance over multiple control algorithms. In low light situations, the cooperative fusion outperforms intermediate fusion by as much as 30%, while increases in tracking performance are most significant in complex traffic scenes
Digital Signal Processor Based Real-Time Phased Array Radar Backend System and Optimization Algorithms
This dissertation presents an implementation of multifunctional large-scale phased array radar based on the scalable DSP platform.
The challenge of building large-scale phased array radar backend is how to address the compute-intensive operations and high data throughput requirement in both front-end and backend in real-time. In most of the applications, FPGA or VLSI hardware are typically used to solve those difficulties. However, with the help of the fast development of IC industry, using a parallel set of high-performing programmable chips can be an alternative. We present a hybrid high-performance backend system by using DSP as the core computing device and MTCA as the system frame. Thus, the mapping techniques for the front and backend signal processing algorithm based on DSP are discussed in depth.
Beside high-efficiency computing device, the system architecture would be a major factor influencing the reliability and performance of the backend system. The reliability requires the system must incorporate the redundancy both in hardware and software. In this dissertation, we propose a parallel modular system based on MTCA chassis, which can be reliable, scalable, and fault-tolerant.
Finally, we present an example of high performance phased array radar backend, in which there is the number of 220 DSPs, achieving 7000 GFLOPS calculation from 768 channels. This example shows the potential of using the combination of DSP and MTCA as the computing platform for the future multi-functional large-scale phased array radar
An Interactive Likelihood for the Multi-Bernoulli Filter
In this thesis, a simple yet effective technique is presented for increasing the accuracy of multi-target tracking algorithms with a focus on sequential Monte-Carlo implementations of random finite set-based approaches. This technique, referred to throughout this work as an interactive likelihood, exploits the spatial information that exists in any given measurement, reducing the need for data association and allowing for more target interaction thereby increasing overall tracking accuracy. The interactive likelihood is constructed entirely within the random finite set framework and is integrated with a multi-Bernoulli filter. In addition, a state-of-the-art deep neural network for pedestrian detection is combined in a novel way with the multi-Bernoulli filter and interactive likelihood in order to obtain a very general and flexible random finite set-based multi-target tracking algorithm. The performance of the algorithm is evaluated in a number of publicly available datasets (2003 PETS INMOVE, AFL, and TUD-Stadtmitte) using standard, well-known multi-target tracking metrics (OSPA and CLEAR MOT)
Multiple cue integration for robust tracking in dynamic environments: application to video relighting
L'anĂ lisi de moviment i seguiment d'objectes ha estat un dels pricipals focus d'atenciĂł en la comunitat de visiĂł per computador durant les dues darreres dècades. L'interès per aquesta Ă rea de recerca resideix en el seu ample ventall d'aplicabilitat, que s'extĂ©n des de tasques de navegaciĂł de vehicles autònoms i robots, fins a aplications en la indĂşstria de l'entreteniment i realitat virtual.Tot i que s'han aconseguit resultats espectaculars en problemes especĂfics, el seguiment d'objectes continua essent un problema obert, ja que els mètodes disponibles sĂłn propensos a ser sensibles a diversos factors i condicions no estacionĂ ries de l'entorn, com ara moviments impredictibles de l'objecte a seguir, canvis suaus o abruptes de la il·luminaciĂł, proximitat d'objectes similars o fons confusos. Enfront aquests factors de confusiĂł la integraciĂł de mĂşltiples caracterĂstiques ha demostrat que permet millorar la robustesa dels algoritmes de seguiment. En els darrers anys, degut a la creixent capacitat de cĂ lcul dels ordinadors, hi ha hagut un significatiu increment en el disseny de complexes sistemes de seguiment que consideren simultĂ niament mĂşltiples caracterĂstiques de l'objecte. No obstant, la majoria d'aquests algoritmes estan basats enheurĂstiques i regles ad-hoc formulades per aplications especĂfiques, fent-ne impossible l'extrapolaciĂł a noves condicions de l'entorn.En aquesta tesi proposem un marc probabilĂstic general per integrar el nombre de caracterĂstiques de l'objecte que siguin necessĂ ries, permetent que interactuin mĂştuament per tal d'estimar-ne el seu estat amb precisiĂł, i per tant, estimar amb precisiĂł la posiciĂł de l'objecte que s'estĂ seguint. Aquest marc, s'utilitza posteriorment per dissenyar un algoritme de seguiment, que es valida en diverses seqüències de vĂdeo que contenen canvis abruptes de posiciĂł i il·luminaciĂł, camuflament de l'objecte i deformacions no rĂgides. Entre les caracterĂstiques que s'han utilitzat per representar l'objecte, cal destacar la paramatritzaciĂł robusta del color en un espai de color dependent de l'objecte, que permet distingir-lo del fons mĂ©s clarament que altres espais de color tĂpicament ulitzats al llarg de la literatura.En la darrera part de la tesi dissenyem una tècnica per re-il·luminar tant escenes estĂ tiques com en moviment, de les que s'en desconeix la geometria. La re-il·luminaciĂł es realitza amb un mètode 'basat en imatges', on la generaciĂł de les images de l'escena sota noves condicions d'il·luminaciĂł s'aconsegueix a partir de combinacions lineals d'un conjunt d'imatges de referència pre-capturades, i que han estat generades il·luminant l'escena amb patrons de llum coneguts. Com que la posiciĂł i intensitat de les fonts d'il.luminaciĂł que formen aquests patrons de llum es pot controlar, Ă©s natural preguntar-nos: quina Ă©s la manera mĂ©s òptima d'il·luminar una escena per tal de reduir el nombre d'imatges de referència? Demostrem que la millor manera d'il·luminar l'escena (Ă©s a dir, la que minimitza el nombre d'imatges de referència) no Ă©s utilitzant una seqüència de fonts d'il·luminaciĂł puntuals, com es fa generalment, sinĂł a travĂ©s d'una seqüència de patrons de llum d'una base d'il·luminaciĂł depenent de l'objecte. És important destacar que quan es re-il·luminen seqüències de vĂdeo, les imatges successives s'han d'alinear respecte a un sistema de coordenades comĂş. Com que cada imatge ha estat generada per un patrĂł de llum diferent il·uminant l'escena, es produiran canvis d'il·luminaciĂł bruscos entre imatges de referència consecutives. Sota aquestes circumstĂ ncies, el mètode de seguiment proposat en aquesta tesi juga un paper fonamental. Finalment, presentem diversos resultats on re-il·luminem seqüències de vĂdeo reals d'objectes i cares d'actors en moviment. En cada cas, tot i que s'adquireix un Ăşnic vĂdeo, som capaços de re-il·luminar una i altra vegada, controlant la direcciĂł de la llum, la seva intensitat, i el color.Motion analysis and object tracking has been one of the principal focus of attention over the past two decades within the computer vision community. The interest of this research area lies in its wide range of applicability, extending from autonomous vehicle and robot navigation tasks, to entertainment and virtual reality applications.Even though impressive results have been obtained in specific problems, object tracking is still an open problem, since available methods are prone to be sensitive to several artifacts and non-stationary environment conditions, such as unpredictable target movements, gradual or abrupt changes of illumination, proximity of similar objects or cluttered backgrounds. Multiple cue integration has been proved to enhance the robustness of the tracking algorithms in front of such disturbances. In recent years, due to the increasing power of the computers, there has been a significant interest in building complex tracking systems which simultaneously consider multiple cues. However, most of these algorithms are based on heuristics and ad-hoc rules formulated for specific applications, making impossible to extrapolate them to new environment conditions.In this dissertation we propose a general probabilistic framework to integrate as many object features as necessary, permitting them to mutually interact in order to obtain a precise estimation of its state, and thus, a precise estimate of the target position. This framework is utilized to design a tracking algorithm, which is validated on several video sequences involving abrupt position and illumination changes, target camouflaging and non-rigid deformations. Among the utilized features to represent the target, it is important to point out the use of a robust parameterization of the target color in an object dependent colorspace which allows to distinguish the object from the background more clearly than other colorspaces commonly used in the literature.In the last part of the dissertation, we design an approach for relighting static and moving scenes with unknown geometry. The relighting is performed through an -image-based' methodology, where the rendering under new lighting conditions is achieved by linear combinations of a set of pre-acquired reference images of the scene illuminated by known light patterns. Since the placement and brightness of the light sources composing such light patterns can be controlled, it is natural to ask: what is the optimal way to illuminate the scene to reduce the number of reference images that are needed? We show that the best way to light the scene (i.e., the way that minimizes the number of reference images) is not using a sequence of single, compact light sources as is most commonly done, but rather to use a sequence of lighting patterns as given by an object-dependent lighting basis. It is important to note that when relighting video sequences, consecutive images need to be aligned with respect to a common coordinate frame. However, since each frame is generated by a different light pattern illuminating the scene, abrupt illumination changes between consecutive reference images are produced. Under these circumstances, the tracking framework designed in this dissertation plays a central role. Finally, we present several relighting results on real video sequences of moving objects, moving faces, and scenes containing both. In each case, although a single video clip was captured, we are able to relight again and again, controlling the lighting direction, extent, and color.Postprint (published version
Recommended from our members
Video content analysis for automated detection and tracking of humans in CCTV surveillance applications
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, human-to-human interactions, scale changes, and scene background-human interactions. A two-stage processing solution, namely, human detection, and human tracking with two novel pattern classifiers is presented. Scale independent human detection is achieved by processing in the wavelet domain using square wavelet features. These features used to characterise human silhouettes at different scales are similar to rectangular features used in [Viola 2001]. At the detection stage two detectors are combined to improve detection rate. The first detector is based on shape-outline of humans extracted from the scene using a reduced complexity outline extraction algorithm. A Shape mismatch measure is used to differentiate between the human and the background class. The second detector uses rectangular features as primitives for silhouette description in the wavelet domain. The marginal distribution of features collocated at a particular position on a candidate human (a patch of the image) is used to describe statistically the silhouette. Two similarity measures are computed between a candidate human and the model histograms of human and non human classes. The similarity measure is used to discriminate between the human and the non human class. At the tracking stage, a tracker based on joint probabilistic data association filter (JPDAF) for data association, and motion correspondence is presented. Track clustering is used to reduce hypothesis enumeration complexity. Towards improving response time with increase in frame dimension, scene complexity, and number of channels; a scalable algorithmic architecture and operating accuracy prediction technique is presented. A scheduling strategy for improving the response time and throughput by parallel processing is also presented