6,344 research outputs found
Single camera pose estimation using Bayesian filtering and Kinect motion priors
Traditional approaches to upper body pose estimation using monocular vision
rely on complex body models and a large variety of geometric constraints. We
argue that this is not ideal and somewhat inelegant as it results in large
processing burdens, and instead attempt to incorporate these constraints
through priors obtained directly from training data. A prior distribution
covering the probability of a human pose occurring is used to incorporate
likely human poses. This distribution is obtained offline, by fitting a
Gaussian mixture model to a large dataset of recorded human body poses, tracked
using a Kinect sensor. We combine this prior information with a random walk
transition model to obtain an upper body model, suitable for use within a
recursive Bayesian filtering framework. Our model can be viewed as a mixture of
discrete Ornstein-Uhlenbeck processes, in that states behave as random walks,
but drift towards a set of typically observed poses. This model is combined
with measurements of the human head and hand positions, using recursive
Bayesian estimation to incorporate temporal information. Measurements are
obtained using face detection and a simple skin colour hand detector, trained
using the detected face. The suggested model is designed with analytical
tractability in mind and we show that the pose tracking can be
Rao-Blackwellised using the mixture Kalman filter, allowing for computational
efficiency while still incorporating bio-mechanical properties of the upper
body. In addition, the use of the proposed upper body model allows reliable
three-dimensional pose estimates to be obtained indirectly for a number of
joints that are often difficult to detect using traditional object recognition
strategies. Comparisons with Kinect sensor results and the state of the art in
2D pose estimation highlight the efficacy of the proposed approach.Comment: 25 pages, Technical report, related to Burke and Lasenby, AMDO 2014
conference paper. Code sample: https://github.com/mgb45/SignerBodyPose Video:
https://www.youtube.com/watch?v=dJMTSo7-uF
Data association and occlusion handling for vision-based people tracking by mobile robots
This paper presents an approach for tracking multiple persons on a mobile robot with a combination of colour and thermal vision sensors, using several new techniques. First, an adaptive colour model is incorporated into the measurement model of the tracker. Second, a new approach for detecting occlusions is introduced, using a machine learning classifier for pairwise comparison of persons (classifying which one is in front of the other). Third, explicit occlusion handling is incorporated into the tracker. The paper presents a comprehensive, quantitative evaluation of the whole system and its different components using several real world data sets
Staple: Complementary Learners for Real-Time Tracking
Correlation Filter-based trackers have recently achieved excellent
performance, showing great robustness to challenging situations exhibiting
motion blur and illumination changes. However, since the model that they learn
depends strongly on the spatial layout of the tracked object, they are
notoriously sensitive to deformation. Models based on colour statistics have
complementary traits: they cope well with variation in shape, but suffer when
illumination is not consistent throughout a sequence. Moreover, colour
distributions alone can be insufficiently discriminative. In this paper, we
show that a simple tracker combining complementary cues in a ridge regression
framework can operate faster than 80 FPS and outperform not only all entries in
the popular VOT14 competition, but also recent and far more sophisticated
trackers according to multiple benchmarks.Comment: To appear in CVPR 201
Long-term experiments with an adaptive spherical view representation for navigation in changing environments
Real-world environments such as houses and offices change over time, meaning that a mobile robot’s map will become out of date. In this work, we introduce a method to update the reference views in a hybrid metric-topological map so that a mobile robot can continue to localize itself in a changing environment. The updating mechanism, based on the multi-store model of human memory, incorporates a spherical metric representation of the observed visual features for each node in the map, which enables the robot to estimate its heading and navigate using multi-view geometry, as well as representing the local 3D geometry of the environment. A series of experiments demonstrate the persistence performance of the proposed system in real changing environments, including analysis of the long-term stability
Video analytics for security systems
This study has been conducted to develop robust event detection and object tracking algorithms that can be implemented in real time video surveillance applications. The aim of the research has been to produce an automated video surveillance system that is able to detect and report potential security risks with minimum human intervention. Since the algorithms are designed to be implemented in real-life scenarios, they must be able to cope with strong illumination changes and occlusions.
The thesis is divided into two major sections. The first section deals with event detection and edge based tracking while the second section describes colour measurement methods developed to track objects in crowded environments.
The event detection methods presented in the thesis mainly focus on detection and tracking of objects that become stationary in the scene. Objects such as baggage left in public places or vehicles parked illegally can cause a serious security threat. A new pixel based classification technique has been developed to detect objects of this type in cluttered scenes. Once detected, edge based object descriptors are obtained and stored as templates for tracking purposes. The consistency of these descriptors is examined using an adaptive edge orientation based technique. Objects are tracked and alarm events are generated if the objects are found to be stationary in the scene after a certain period of time. To evaluate the full capabilities of the pixel based classification and adaptive edge orientation based tracking methods, the model is tested using several hours of real-life video surveillance scenarios recorded at different locations and time of day from our own and publically available databases (i-LIDS, PETS, MIT, ViSOR). The performance results demonstrate that the combination of pixel based classification and adaptive edge orientation based tracking gave over 95% success rate. The results obtained also yield better detection and tracking results when compared with the other available state of the art methods.
In the second part of the thesis, colour based techniques are used to track objects in crowded video sequences in circumstances of severe occlusion. A novel Adaptive Sample Count Particle Filter (ASCPF) technique is presented that improves the performance of the standard Sample Importance Resampling Particle Filter by up to 80% in terms of computational cost. An appropriate particle range is obtained for each object and the concept of adaptive samples is introduced to keep the computational cost down. The objective is to keep the number of particles to a minimum and only to increase them up to the maximum, as and when required. Variable standard deviation values for state vector elements have been exploited to cope with heavy occlusion. The technique has been tested on different video surveillance scenarios with variable object motion, strong occlusion and change in object scale. Experimental results show that the proposed method not only tracks the object with comparable accuracy to existing particle filter techniques but is up to five times faster. Tracking objects in a multi camera environment is discussed in the final part of the thesis. The ASCPF technique is deployed within a multi-camera environment to track objects across different camera views. Such environments can pose difficult challenges such as changes in object scale and colour features as the objects move from one camera view to another. Variable standard deviation values of the ASCPF have been utilized in order to cope with sudden colour and scale changes. As the object moves from one scene to another, the number of particles, together with the spread value, is increased to a maximum to reduce any effects of scale and colour change. Promising results are obtained when the ASCPF technique is tested on live feeds from four different camera views. It was found that not only did the ASCPF method result in the successful tracking of the moving object across different views but also maintained the real time frame rate due to its reduced computational cost thus indicating that the method is a potential practical solution for multi camera tracking applications
Real-time people tracking in a camera network
Visual tracking is a fundamental key to the recognition and analysis of human behaviour.
In this thesis we present an approach to track several subjects using multiple
cameras in real time. The tracking framework employs a numerical Bayesian estimator,
also known as a particle lter, which has been developed for parallel implementation on
a Graphics Processing Unit (GPU). In order to integrate multiple cameras into a single
tracking unit we represent the human body by a parametric ellipsoid in a 3D world.
The elliptical boundary can be projected rapidly, several hundred times per subject per
frame, onto any image for comparison with the image data within a likelihood model.
Adding variables to encode visibility and persistence into the state vector, we tackle the
problems of distraction and short-period occlusion. However, subjects may also disappear
for longer periods due to blind spots between cameras elds of view. To recognise
a desired subject after such a long-period, we add coloured texture to the ellipsoid surface,
which is learnt and retained during the tracking process. This texture signature
improves the recall rate from 60% to 70-80% when compared to state only data association.
Compared to a standard Central Processing Unit (CPU) implementation, there
is a signi cant speed-up ratio
Audio‐Visual Speaker Tracking
Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multi‐speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in real‐time tracking and localization of speakers. However, speaker tracking is a challenging task in real‐life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multi‐modal information, as it conveys complementary information about the state of the speakers compared to single‐modal tracking. To use multi‐modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a state‐of‐the‐art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field
- …