1,247 research outputs found
Object Tracking
Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application
Tracking interacting targets in multi-modal sensors
PhDObject tracking is one of the fundamental tasks in various applications such as surveillance,
sports, video conferencing and activity recognition. Factors such as occlusions,
illumination changes and limited field of observance of the sensor make tracking a challenging
task. To overcome these challenges the focus of this thesis is on using multiple
modalities such as audio and video for multi-target, multi-modal tracking. Particularly,
this thesis presents contributions to four related research topics, namely, pre-processing of
input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking,
and interaction recognition.
To improve the performance of detection algorithms, especially in the presence
of noise, this thesis investigate filtering of the input data through spatio-temporal feature
analysis as well as through frequency band analysis. The pre-processed data from multiple
modalities is then fused within Particle filtering (PF). To further minimise the discrepancy
between the real and the estimated positions, we propose a strategy that associates the
hypotheses and the measurements with a real target, using a Weighted Probabilistic Data
Association (WPDA). Since the filtering involved in the detection process reduces the
available information and is inapplicable on low signal-to-noise ratio data, we investigate
simultaneous detection and tracking approaches and propose a multi-target track-beforedetect
Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses
the detection step and performs tracking in the raw signal. Finally, we apply the proposed
multi-modal tracking to recognise interactions between targets in regions within, as well
as outside the cameras’ fields of view.
The efficiency of the proposed approaches are demonstrated on large uni-modal,
multi-modal and multi-sensor scenarios from real world detections, tracking and event
recognition datasets and through participation in evaluation campaigns
Person re-Identification over distributed spaces and time
PhDReplicating the human visual system and cognitive abilities that the brain uses to process the
information it receives is an area of substantial scientific interest. With the prevalence of video
surveillance cameras a portion of this scientific drive has been into providing useful automated
counterparts to human operators. A prominent task in visual surveillance is that of matching
people between disjoint camera views, or re-identification. This allows operators to locate people
of interest, to track people across cameras and can be used as a precursory step to multi-camera
activity analysis. However, due to the contrasting conditions between camera views and their
effects on the appearance of people re-identification is a non-trivial task. This thesis proposes
solutions for reducing the visual ambiguity in observations of people between camera views
This thesis first looks at a method for mitigating the effects on the appearance of people under
differing lighting conditions between camera views. This thesis builds on work modelling
inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer
Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited
training samples. Unlike previous methods that use a mean-based representation for a set of
training samples, the cumulative nature of the CBTF retains colour information from underrepresented
samples in the training set. Additionally, the bi-directionality of the mapping function
is explored to try and maximise re-identification accuracy by ensuring samples are accurately
mapped between cameras.
Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing
lighting conditions within a single camera. As the CBTF requires manually labelled training
samples it is limited to static lighting conditions and is less effective if the lighting changes. This
Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting
change over time, or rely on camera transition time information to update. By utilising contextual
information drawn from the background in each camera view, an estimation of the lighting
change within a single camera can be made. This background lighting model allows the mapping
of colour information back to the original training conditions and thus remove the need for
3
retraining.
Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous
methods use a score based on a direct distance measure of set features to form a correct/incorrect
match result. Rather than offering an operator a single outcome, the ranking paradigm is to give
the operator a ranked list of possible matches and allow them to make the final decision. By utilising
a Support Vector Machine (SVM) ranking method, a weighting on the appearance features
can be learned that capitalises on the fact that not all image features are equally important to
re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues
by separating the training samples into smaller subsets and boosting the trained models.
Finally, the thesis looks at a practical application of the ranking paradigm in a real world application.
The system encompasses both the re-identification stage and the precursory extraction
and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined
to extract relevant information from the video, while several combinations of matching
techniques are combined with temporal priors to form a more comprehensive overall matching
criteria.
The effectiveness of the proposed approaches is tested on datasets obtained from a variety
of challenging environments including offices, apartment buildings, airports and outdoor public
spaces
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Robust Modular Feature-Based Terrain-Aided Visual Navigation and Mapping
The visual feature-based Terrain-Aided Navigation (TAN) system presented in this thesis addresses the problem of constraining inertial drift introduced into the location estimate of Unmanned Aerial Vehicles (UAVs) in GPS-denied environment. The presented TAN system utilises salient visual features representing semantic or human-interpretable objects (roads, forest and water boundaries) from onboard aerial imagery and associates them to a database of reference features created a-priori, through application of the same feature detection algorithms to satellite imagery. Correlation of the detected features with the reference features via a series of the robust data association steps allows a localisation solution to be achieved with a finite absolute bound precision defined by the certainty of the reference dataset. The feature-based Visual Navigation System (VNS) presented in this thesis was originally developed for a navigation application using simulated multi-year satellite image datasets. The extension of the system application into the mapping domain, in turn, has been based on the real (not simulated) flight data and imagery. In the mapping study the full potential of the system, being a versatile tool for enhancing the accuracy of the information derived from the aerial imagery has been demonstrated. Not only have the visual features, such as road networks, shorelines and water bodies, been used to obtain a position ’fix’, they have also been used in reverse for accurate mapping of vehicles detected on the roads into an inertial space with improved precision. Combined correction of the geo-coding errors and improved aircraft localisation formed a robust solution to the defense mapping application. A system of the proposed design will provide a complete independent navigation solution to an autonomous UAV and additionally give it object tracking capability
Motion prediction and interaction localisation of people in crowds
PhDThe ability to analyse and predict the movement of people in crowded scenarios can be of
fundamental importance for tracking across multiple cameras and interaction localisation. In this
thesis, we propose a person re-identification method that takes into account the spatial location
of cameras using a plan of the locale and the potential paths people can follow in the unobserved
areas. These potential paths are generated using two models. In the first, people’s trajectories are
constrained to pass through a set of areas of interest (landmarks) in the site. In the second we
integrate a goal-driven approach to the Social Force Model (SFM), initially introduced for crowd
simulation. SFM models the desire of people to reach specific interest points (goals) in a site,
such as exits, shops, seats and meeting points while avoiding walls and barriers. Trajectory propagation
creates the possible re-identification candidates, on which association of people across
cameras is performed using spatial location of the candidates and appearance features extracted
around a person’s head. We validate the proposed method in a challenging scenario from London
Gatwick airport and compare it to state-of-the-art person re-identification methods.
Moreover, we perform detection and tracking of interacting people in a framework based
on SFM that analyses people’s trajectories. The method embeds plausible human behaviours
to predict interactions in a crowd by iteratively minimising the error between predictions and
measurements. We model people approaching a group and restrict the group formation based
on the relative velocity of candidate group members. The detected groups are then tracked by
linking their centres of interaction over time using a buffered graph-based tracker. We show how
the proposed framework outperforms existing group localisation techniques on three publicly
available datasets
Proceedings of the 2009 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory
The joint workshop of the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, Karlsruhe, and the Vision and Fusion Laboratory (Institute for Anthropomatics, Karlsruhe Institute of Technology (KIT)), is organized annually since 2005 with the aim to report on the latest research and development findings of the doctoral students of both institutions. This book provides a collection of 16 technical reports on the research results presented on the 2009 workshop
Automated camera ranking and selection using video content and scene context
PhDWhen observing a scene with multiple cameras, an important problem to solve is to automatically
identify “what camera feed should be shown and when?” The answer to this question is of interest
for a number of applications and scenarios ranging from sports to surveillance. In this thesis we
present a framework for the ranking of each video frame and camera across time and the camera
network, respectively. This ranking is then used for automated video production. In the first stage
information from each camera view and from the objects in it is extracted and represented in a way
that allows for object- and frame-ranking. First objects are detected and ranked within and across
camera views. This ranking takes into account both visible and contextual information related to
the object. Then content ranking is performed based on the objects in the view and camera-network
level information. We propose two novel techniques for content ranking namely: Routing Based
Ranking (RBR) and Multivariate Gaussian based Ranking (MVG). In RBR we use a rule based
framework where weighted fusion of object and frame level information takes place while in MVG
the rank is estimated as a multivariate Gaussian distribution. Through experimental and subjective
validation we demonstrate that the proposed content ranking strategies allows the identification of
the best-camera at each time.
The second part of the thesis focuses on the automatic generation of N-to-1 videos based on the
ranked content. We demonstrate that in such production settings it is undesirable to have frequent
inter-camera switching. Thus motivating the need for a compromise, between selecting the best
camera most of the time and minimising the frequent inter-camera switching, we demonstrate that
state-of-the-art techniques for this task are inadequate and fail in dynamic scenes. We propose three
novel methods for automated camera selection. The first method (¡go f ) performs a joint optimization
of a cost function that depends on both the view quality and inter-camera switching so that a
i
Abstract ii
pleasing best-view video sequence can be composed. The other two methods (¡dbn and ¡util) include
the selection decision into the ranking-strategy. In ¡dbn we model the best-camera selection
as a state sequence via Directed Acyclic Graphs (DAG) designed as a Dynamic Bayesian Network
(DBN), which encodes the contextual knowledge about the camera network and employs the past
information to minimize the inter camera switches. In comparison ¡util utilizes the past as well
as the future information in a Partially Observable Markov Decision Process (POMDP) where the
camera-selection at a certain time is influenced by the past information and its repercussions in
the future. The performance of the proposed approach is demonstrated on multiple real and synthetic
multi-camera setups. We compare the proposed architectures with various baseline methods
with encouraging results. The performance of the proposed approaches is also validated through
extensive subjective testing
- …