54 research outputs found
Segmenting and tracking objects in video sequences based on graphical probabilistic models
Ph.DDOCTOR OF PHILOSOPH
Object Tracking
Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application
Activity Analysis; Finding Explanations for Sets of Events
Automatic activity recognition is the computational process of analysing visual input and reasoning about detections to understand the performed events. In all but the simplest scenarios, an activity involves multiple interleaved events, some related and others independent. The activity in a car park or at a playground would typically include many events. This research assumes the possible events and any constraints between the events can be defined for the given scene. Analysing the activity should thus recognise a complete and consistent set of events; this is referred to as a global explanation of the activity. By seeking a global explanation that satisfies the activity’s constraints, infeasible interpretations can be avoided, and ambiguous observations may be resolved.
An activity’s events and any natural constraints are defined using a grammar formalism. Attribute Multiset Grammars (AMG) are chosen because they allow defining hierarchies, as well as attribute rules and constraints. When used for recognition, detectors are employed to gather a set of detections. Parsing the set of detections by the AMG provides a global explanation. To find the best parse tree given a set of detections, a Bayesian network models the probability distribution over the space of possible parse trees. Heuristic and exhaustive search techniques are proposed to find the maximum a posteriori global explanation.
The framework is tested for two activities: the activity in a bicycle rack, and around a building entrance. The first case study involves people locking bicycles onto a bicycle rack and picking them up later. The best global explanation for all detections gathered during the day resolves local ambiguities from occlusion or clutter. Intensive testing on 5 full days proved global analysis achieves higher recognition rates. The second case study tracks people and any objects they are carrying as they enter and exit a building entrance. A complete sequence of the person entering and exiting multiple times is recovered by the global explanation
Indexing Techniques for Image and Video Databases: an approach based on Animate Vision Paradigm
[ITALIANO]In questo lavoro di tesi vengono presentate e discusse delle innovative tecniche di indicizzazione per database video e di immagini basate sul paradigma della “Animate Vision” (Visione Animata).
Da un lato, sarà mostrato come utilizzando, quali algoritmi di analisi di una data immagine, alcuni meccanismi di visione biologica, come i movimenti saccadici e le fissazioni dell'occhio umano, sia possibile ottenere un query processing in database di immagini più efficace ed efficiente. In particolare, verranno discussi, la metodologia grazie alla quale risulta possibile generare due sequenze di fissazioni, a partire rispettivamente, da un'immagine di query I_q ed una di test I_t del data set, e, come confrontare tali sequenze al fine di determinare una possibile misura della similarità (consistenza) tra le due immagini. Contemporaneamente, verrà discusso come tale approccio unito a tecniche classiche di clustering possa essere usato per scoprire le associazioni semantiche nascoste tra immagini, in termini di categorie, che, di contro, permettono un'automatica pre-classificazione (indicizzazione) delle immagini e possono essere usate per guidare e migliorare il processo di query. Saranno presentati, infine, dei risultati preliminari e l'approccio proposto sarà confrontato con le più recenti tecniche per il recupero di immagini descritte in letteratura.
Dall'altro lato, sarà mostrato come utilizzando la precedente rappresentazione “foveata” di un'immagine, risulti possibile partizionare un video in shot. Più precisamente, il metodo per il rilevamento dei cambiamenti di shot si baserà sulla computazione, in ogni istante di tempo, della misura di consistenza tra le sequenze di fissazioni generate da un osservatore ideale che guarda il video. Lo schema proposto permette l'individuazione, attraverso l'utilizzo di un'unica tecnica anziché di più metodi dedicati, sia delle transizioni brusche sia di quelle graduali. Vengono infine mostrati i risultati ottenuti su varie tipologie di video e, come questi, validano l'approccio proposto. / [INGLESE]In this dissertation some novel indexing techniques for video and image database based on “Animate Vision” Paradigm are presented and discussed.
From one hand, it will be shown how, by embedding within image inspection algorithms active mechanisms of biological vision such as saccadic eye movements and fixations, a more effective query processing in image database can be achieved. In particular, it will be discussed the way to generate two fixation sequences from a query image I_q and a test image I_t of the data set, respectively, and how to compare the two sequences in order to compute a possible similarity (consistency) measure between the two images. Meanwhile, it will be shown how the approach can be used with classical clustering techniques to discover and represent the hidden semantic associations among images, in terms of categories, which, in turn, allow an automatic pre-classification (indexing), and can be used to drive and improve the query processing. Eventually, preliminary results will be presented and the proposed approach compared with the most recent techniques for image retrieval described in the literature.
From the other one, it will be discussed how by taking advantage of such foveated representation of an image, it is possible to partitioning of a video into shots. More precisely, the shot-change detection method will be based on the computation, at each time instant, of the consistency measure of the fixation sequences generated by an ideal observer looking at the video. The proposed scheme aims at detecting both abrupt and gradual transitions between shots using a single technique, rather than a set of dedicated methods. Results on videos of various content types are reported and validate the proposed approach
Energy Minimization for Multiple Object Tracking
Multiple target tracking aims at reconstructing trajectories of several
moving targets in a dynamic scene, and is of significant relevance for a
large number of applications. For example, predicting a pedestrian’s
action may be employed to warn an inattentive driver and reduce road
accidents; understanding a dynamic environment will facilitate
autonomous robot navigation; and analyzing crowded scenes can prevent
fatalities in mass panics.
The task of multiple target tracking is challenging for various reasons:
First of all, visual data is often ambiguous. For example, the objects
to be tracked can remain undetected due to low contrast and occlusion.
At the same time, background clutter can cause spurious measurements
that distract the tracking algorithm. A second challenge arises when
multiple measurements appear close to one another. Resolving
correspondence ambiguities leads to a combinatorial problem that quickly
becomes more complex with every time step. Moreover, a realistic model
of multi-target tracking should take physical constraints into account.
This is not only important at the level of individual targets but also
regarding interactions between them, which adds to the complexity of the
problem.
In this work the challenges described above are addressed by means of
energy minimization. Given a set of object detections, an energy
function describing the problem at hand is minimized with the goal of
finding a plausible solution for a batch of consecutive frames. Such
offline tracking-by-detection approaches have substantially advanced the
performance of multi-target tracking. Building on these ideas, this
dissertation introduces three novel techniques for multi-target tracking
that extend the state of the art as follows: The first approach
formulates the energy in discrete space, building on the work of Berclaz
et al. (2009). All possible target locations are reduced to a regular
lattice and tracking is posed as an integer linear program (ILP),
enabling (near) global optimality. Unlike prior work, however, the
proposed formulation includes a dynamic model and additional constraints
that enable performing non-maxima suppression (NMS) at the level of
trajectories. These contributions improve the performance both
qualitatively and quantitatively with respect to annotated ground truth.
The second technical contribution is a continuous energy function for
multiple target tracking that overcomes the limitations imposed by
spatial discretization. The continuous formulation is able to capture
important aspects of the problem, such as target localization or motion
estimation, more accurately. More precisely, the data term as well as
all phenomena including mutual exclusion and occlusion, appearance,
dynamics and target persistence are modeled by continuous differentiable
functions. The resulting non-convex optimization problem is minimized
locally by standard conjugate gradient descent in combination with
custom discontinuous jumps. The more accurate representation of the
problem leads to a powerful and robust multi-target tracking approach,
which shows encouraging results on particularly challenging video
sequences.
Both previous methods concentrate on reconstructing trajectories, while
disregarding the target-to-measurement assignment problem. To unify both
data association and trajectory estimation into a single optimization
framework, a discrete-continuous energy is presented in Part III of this
dissertation. Leveraging recent advances in discrete optimization
(Delong et al., 2012), it is possible to formulate multi-target tracking
as a model-fitting approach, where discrete assignments and continuous
trajectory representations are combined into a single objective
function. To enable efficient optimization, the energy is minimized
locally by alternating between the discrete and the continuous set of
variables.
The final contribution of this dissertation is an extensive discussion
on performance evaluation and comparison of tracking algorithms, which
points out important practical issues that ought not be ignored
Person re-Identification over distributed spaces and time
PhDReplicating the human visual system and cognitive abilities that the brain uses to process the
information it receives is an area of substantial scientific interest. With the prevalence of video
surveillance cameras a portion of this scientific drive has been into providing useful automated
counterparts to human operators. A prominent task in visual surveillance is that of matching
people between disjoint camera views, or re-identification. This allows operators to locate people
of interest, to track people across cameras and can be used as a precursory step to multi-camera
activity analysis. However, due to the contrasting conditions between camera views and their
effects on the appearance of people re-identification is a non-trivial task. This thesis proposes
solutions for reducing the visual ambiguity in observations of people between camera views
This thesis first looks at a method for mitigating the effects on the appearance of people under
differing lighting conditions between camera views. This thesis builds on work modelling
inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer
Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited
training samples. Unlike previous methods that use a mean-based representation for a set of
training samples, the cumulative nature of the CBTF retains colour information from underrepresented
samples in the training set. Additionally, the bi-directionality of the mapping function
is explored to try and maximise re-identification accuracy by ensuring samples are accurately
mapped between cameras.
Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing
lighting conditions within a single camera. As the CBTF requires manually labelled training
samples it is limited to static lighting conditions and is less effective if the lighting changes. This
Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting
change over time, or rely on camera transition time information to update. By utilising contextual
information drawn from the background in each camera view, an estimation of the lighting
change within a single camera can be made. This background lighting model allows the mapping
of colour information back to the original training conditions and thus remove the need for
3
retraining.
Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous
methods use a score based on a direct distance measure of set features to form a correct/incorrect
match result. Rather than offering an operator a single outcome, the ranking paradigm is to give
the operator a ranked list of possible matches and allow them to make the final decision. By utilising
a Support Vector Machine (SVM) ranking method, a weighting on the appearance features
can be learned that capitalises on the fact that not all image features are equally important to
re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues
by separating the training samples into smaller subsets and boosting the trained models.
Finally, the thesis looks at a practical application of the ranking paradigm in a real world application.
The system encompasses both the re-identification stage and the precursory extraction
and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined
to extract relevant information from the video, while several combinations of matching
techniques are combined with temporal priors to form a more comprehensive overall matching
criteria.
The effectiveness of the proposed approaches is tested on datasets obtained from a variety
of challenging environments including offices, apartment buildings, airports and outdoor public
spaces
Pattern Recognition
A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
- …