6 research outputs found
On the Mahalanobis Distance Classification Criterion for Multidimensional Normal Distributions
Many existing engineering works model the statistical characteristics of the entities under study as normal distributions. These models are eventually used for decision
making, requiring in practice the definition of the classification region corresponding to the desired confidence level. Surprisingly enough, however, a great amount of computer vision works using multidimensional normal models leave unspecified or fail to establish correct confidence regions due to misconceptions on the features of Gaussian functions or to wrong analogies with the unidimensional case. The resulting regions incur in deviations that can be unacceptable in high-dimensional models.
Here we provide a comprehensive derivation of the optimal
confidence regions for multivariate normal distributions of arbitrary dimensionality. To this end, firstly we derive the condition for region optimality of general continuous multidimensional distributions, and then we apply it to the widespread case of the normal probability density function. The obtained results are used to analyze the confidence error incurred by previous works related to vision research, showing that deviations caused by wrong regions may turn into unacceptable as dimensionality increases. To support the theoretical analysis, a quantitative example in the context of moving object detection by means of background modeling is given
Probabilistic three-dimensional object tracking based on adaptive depth segmentation
Object tracking is one of the fundamental topics of computer vision with diverse applications. The arising challenges in tracking, i.e., cluttered scenes, occlusion, complex motion, and illumination variations have motivated utilization of depth information from 3D sensors. However, current 3D trackers are not applicable to unconstrained environments without a priori knowledge. As an important object detection module in tracking, segmentation subdivides an image into its constituent regions. Nevertheless, the existing range segmentation methods in literature are difficult to implement in real-time due to their slow performance. In this thesis, a 3D object tracking method based on adaptive depth segmentation and particle filtering is presented. In this approach, the segmentation method as the bottom-up process is combined with the particle filter as the top-down process to achieve efficient tracking results under challenging circumstances. The experimental results demonstrate the efficiency, as well as robustness of the tracking algorithm utilizing real-world range information
Particle Filters for Colour-Based Face Tracking Under Varying Illumination
Automatic human face tracking is the basis of robotic and active vision systems used for facial feature analysis, automatic surveillance, video conferencing, intelligent transportation, human-computer interaction and many other applications. Superior human face tracking will allow future safety surveillance systems which monitor drowsy drivers, or patients and elderly people at the risk of seizure or sudden falls and will perform with lower risk of failure in unexpected situations. This area has actively been researched in the current literature in an attempt to make automatic face trackers more stable in challenging real-world environments. To detect faces in video sequences, features like colour, texture, intensity, shape or motion is used. Among these feature colour has been the most popular, because of its insensitivity to orientation and size changes and fast process-ability. The challenge of colour-based face trackers, however, has been dealing with the instability of trackers in case of colour changes due to the drastic variation in environmental illumination. Probabilistic tracking and the employment of particle filters as powerful Bayesian stochastic estimators, on the other hand, is increasing in the visual tracking field thanks to their ability to handle multi-modal distributions in cluttered scenes. Traditional particle filters utilize transition prior as importance sampling function, but this can result in poor posterior sampling. The objective of this research is to investigate and propose stable face tracker capable of dealing with challenges like rapid and random motion of head, scale changes when people are moving closer or further from the camera, motion of multiple people with close skin tones in the vicinity of the model person, presence of clutter and occlusion of face. The main focus has been on investigating an efficient method to address the sensitivity of the colour-based trackers in case of gradual or drastic illumination variations. The particle filter is used to overcome the instability of face trackers due to nonlinear and random head motions. To increase the traditional particle filter\u27s sampling efficiency an improved version of the particle filter is introduced that considers the latest measurements. This improved particle filter employs a new colour-based bottom-up approach that leads particles to generate an effective proposal distribution. The colour-based bottom-up approach is a classification technique for fast skin colour segmentation. This method is independent to distribution shape and does not require excessive memory storage or exhaustive prior training. Finally, to address the adaptability of the colour-based face tracker to illumination changes, an original likelihood model is proposed based of spatial rank information that considers both the illumination invariant colour ordering of a face\u27s pixels in an image or video frame and the spatial interaction between them. The original contribution of this work lies in the unique mixture of existing and proposed components to improve colour-base recognition and tracking of faces in complex scenes, especially where drastic illumination changes occur. Experimental results of the final version of the proposed face tracker, which combines the methods developed, are provided in the last chapter of this manuscript
Human robot interaction in a crowded environment
Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3].
Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person.
Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
Autonomous navigation for guide following in crowded indoor environments
The requirements for assisted living are rapidly changing as the number of elderly
patients over the age of 60 continues to increase. This rise places a high level of stress on
nurse practitioners who must care for more patients than they are capable. As this trend is
expected to continue, new technology will be required to help care for patients. Mobile
robots present an opportunity to help alleviate the stress on nurse practitioners by
monitoring and performing remedial tasks for elderly patients. In order to produce
mobile robots with the ability to perform these tasks, however, many challenges must be
overcome.
The hospital environment requires a high level of safety to prevent patient injury. Any
facility that uses mobile robots, therefore, must be able to ensure that no harm will come
to patients whilst in a care environment. This requires the robot to build a high level of
understanding about the environment and the people with close proximity to the robot.
Hitherto, most mobile robots have used vision-based sensors or 2D laser range finders.
3D time-of-flight sensors have recently been introduced and provide dense 3D point
clouds of the environment at real-time frame rates. This provides mobile robots with
previously unavailable dense information in real-time. I investigate the use of time-of-flight
cameras for mobile robot navigation in crowded environments in this thesis. A
unified framework to allow the robot to follow a guide through an indoor environment
safely and efficiently is presented. Each component of the framework is analyzed in
detail, with real-world scenarios illustrating its practical use.
Time-of-flight cameras are relatively new sensors and, therefore, have inherent problems
that must be overcome to receive consistent and accurate data. I propose a novel and
practical probabilistic framework to overcome many of the inherent problems in this
thesis. The framework fuses multiple depth maps with color information forming a
reliable and consistent view of the world. In order for the robot to interact with the
environment, contextual information is required. To this end, I propose a region-growing
segmentation algorithm to group points based on surface characteristics, surface normal
and surface curvature. The segmentation process creates a distinct set of surfaces,
however, only a limited amount of contextual information is available to allow for
interaction. Therefore, a novel classifier is proposed using spherical harmonics to
differentiate people from all other objects.
The added ability to identify people allows the robot to find potential candidates to
follow. However, for safe navigation, the robot must continuously track all visible
objects to obtain positional and velocity information. A multi-object tracking system is
investigated to track visible objects reliably using multiple cues, shape and color. The
tracking system allows the robot to react to the dynamic nature of people by building an
estimate of the motion flow. This flow provides the robot with the necessary information
to determine where and at what speeds it is safe to drive. In addition, a novel search
strategy is proposed to allow the robot to recover a guide who has left the field-of-view.
To achieve this, a search map is constructed with areas of the environment ranked
according to how likely they are to reveal the guide’s true location. Then, the robot can
approach the most likely search area to recover the guide. Finally, all components
presented are joined to follow a guide through an indoor environment. The results
achieved demonstrate the efficacy of the proposed components
Feature-based detection and tracking of individuals in dense crowds
Ph.DDOCTOR OF PHILOSOPH