146 research outputs found

    Statistical facial feature extraction and lip segmentation

    Get PDF
    Facial features such as lip corners, eye corners and nose tip are critical points in a human face. Robust extraction of such facial feature locations is an important problem which is used in a wide range of applications including audio-visual speech recognition, human-computer interaction, emotion recognition, fatigue detection and gesture recognition. In this thesis, we develop a probabilistic method for facial feature extraction. This technique is able to automatically learn location and texture information of facial features from a training set. Facial feature locations are extracted from face regions using joint distributions of locations and textures represented with mixtures of Gaussians. This formulation results in a maximum likelihood (ML) optimization problem which can be solved using either a gradient ascent or Newton type algorithm. Extracted lip corner locations are then used to initialize a lip segmentation algorithm to extract the lip contours. We develop a level-set based method that utilizes adaptive color distributions and shape priors for lip segmentation. More precisely, an implicit curve representation which learns the color information of lip and non-lip points from a training set is employed. The model can adapt itself to the image of interest using a coarse elliptical region. Extracted lip contour provides detailed information about the lip shape. Both methods are tested using different databases for facial feature extraction and lip segmentation. It is shown that the proposed methods achieve better results compared to conventional methods. Our facial feature extraction method outperforms the active appearance models in terms of pixel errors, while our lip segmentation method outperforms region based level-set curve evolutions in terms of precision and recall results

    Video-based situation assessment for road safety

    Get PDF
    In recent decades, situational awareness (SA) has been a major research subject in connection with autonomous vehicles and intelligent transportation systems. Situational awareness concerns the safety of road users, including drivers, passengers, pedestrians and animals. Moreover, it holds key information regarding the nature of upcoming situations. In order to build robust automatic SA systems that sense the environment, a variety of sensors, such as global positioning systems, radars and cameras, have been used. However, due to the high cost, complex installation procedures and high computational load of automatic situational awareness systems, they are unlikely to become standard for vehicles in the near future. In this thesis, a novel video-based framework for the automatic assessment of risk of collision in a road scene is proposed. The framework uses as input the video from a monocular video camera only, avoiding the need for additional, and frequently expensive, sensors. The framework has two main parts: a novel ontology tool for the assessment of risk of collision, and semantic feature extraction based on computervision methods. The ontology tool is designed to represent the various relations between the most important risk factors, such as risk from object and road environmental risk. The semantic features related to these factors iii Abstract iv are based on computer vision methods, such as pedestrian detection and tracking, road-region detection and road-type classification. The quality of these methods is important for achieving accurate results, especially with respect to video segmentation. This thesis, therefore, proposes a new criterion of high-quality video segmentation: the inclusion of temporal-region consistency. On the basis of the new criteria, an online method for the evaluation of video segmentation quality is proposed. This method is more consistent than the state-of-the-art method in terms of perceptual-segmentation quality, for both synthetic and real video datasets. Furthermore, using the Gaussian mixture model for video segmentation, one of the successful video segmentation methods in this area, new online methods for both road-type classification and road-region detection are proposed. The proposed vision-based road-type classification method achieves higher classification accuracy than the state-of-the-art method, for each road type individually. Consequently, it achieves higher overall classi- fication accuracy. Likewise, the proposed vision-based road-region detection method achieves high performance accuracy compared to the state-of-the-art methods, according to two measures: pixel-wise percentage accuracy and area under the receiver operating characteristic (ROC) curve (AUC). Finally, the evaluation performance of the automatic risk-assessment framework is measured. At this stage, the framework includes only the assessment of pedestrian risk in the road scene. Using the semantic information obtained via computer-vision methods, the framework's performance is assessed for two datasets: first, a new dataset proposed in Chapter 7, which comprises six videos, and second, a dataset comAbstract v prising five examples selected from an established, publicly available dataset. Both datasets consist of real-world videos illustrating pedestrian movement. The experimental results show that the proposed framework achieves high accuracy in the assessment of risk resulting from pedestrian behaviour in road scenes

    Ontology-based framework for risk assessment in road scenes using videos

    Get PDF
    Recent advances in autonomous vehicle technology pose an important problem of automatic risk assessment in road scenes. This article addresses the problem by proposing a novel ontology tool for assessment of risk in unpredictable road traffic environment, as it does not assume that the road users always obey the traffic rules. A framework for video-based assessment of the risk in a road scene encompassing the above ontology is also presented in the paper. The framework uses as input the video from a monocular video camera only, avoiding the need for additional sometimes expensive sensors. The key entities in the road scene (vehicles, pedestrians, environment objects etc.) are organised into an ontology which encodes their hierarchy, relations and interactions. The ontology tool infers the degree of risk in a given scene using as knowledge video-based features, related to the key entities. The evaluation of the proposed framework focuses on scenarios in which risk results from pedestrian behaviour. A dataset consisting of real-world videos illustrating pedestrian movement is built. Features related to the key entities in the road scene are extracted and fed to the ontology, which evaluates the degree of risk in the scene. The experimental results indicate that the proposed framework is capable of assessing risk resulting from pedestrian behaviour in various road scenes accurately

    NEIGHBORHOOD-LEVEL LEARNING TECHNIQUES FOR NONPARAMETRIC SCENE MODELS

    Get PDF
    Scene model based segmentation of video into foreground and background structure has long been an important and ongoing research topic in image processing and computer vision. Segmentation of complex video scenes into binary foreground/background label images is often the first step in a wide range of video processing applications. Examples of common applications include surveillance, Traffic Monitoring, People Tracking, Activity Recognition, and Event Detection.A wide range of scene modeling techniques have been proposed for identifying foreground pixels or regions in surveillance video. Broadly speaking, the purpose of a scene model is to characterize the distribution of features in an image block or pixel over time. In the majority of cases, the scene model is used to represent the distribution of background features (background modeling) and the distribution of foreground features is assumed to be uniform or Gaussian. In other cases, the model characterizes the distribution of foreground and background values and the segmentation is performed by maximum likelihood.Pixel-level scene models characterize the distributions of spatiotemporally localized image features centered about each pixel location in video over time. Individual video frames are segmented into foreground and background regions based on a comparison between pixel-level features from within the frame under segmentation and the appropriate elements of the scene model at the corresponding pixel location. Prominent pixel level scene models include the Single Gaussian, Gaussian Mixture Model and Kernel Density Estimation.Recently reported advancements in scene modeling techniques have been largely based on the exploitation of local coherency in natural imagery based on integration of neighborhood information among nonparametric pixel-level scene models. The earliest scene models inadvertently made use of neighborhood information because they modeled images at the block level. As the resolution of the scene models progressed, textural image features such as the spatial derivative, local binary pattern (LBP) or Wavelet coefficients were employed to provide neighborhood-level structural information in the pixel-level models. In the most recent case, Barnich and Van DroogenBroeck proposed the Visual Background Extractor (ViBe), where neighborhood-level information is incorporated into the scene model in the learning step. In ViBe, the learning function is distributed over a small region such that new background information is absorbed at both the pixel and neighborhood level.In this dissertation, I present a nonparametric pixel level scene model based on several recently reported stochastic video segmentations algorithms. I propose new stochastic techniques for updating scene models over time that are focused on the incorporation of neighborhood-level features into the model learning process and demonstrate the effectiveness of the system on a wide range of challenging visual tasks. Specifically, I propose a model maintenance policy that is based on the replacement of outliers within each nonparametric pixel level model through kernel density estimation (KDE) and a neighborhood diffusion procedure where information sharing between adjacent models having significantly different shapes is discouraged. Quantitative results are compared using the well known percentage correct classification (PCC) and a new probability correct classification (PrCC) metric, where the underlying models are scrutinized prior to application of a final segmentation threshold. In all cases considered, the superiority of the proposed model with respect to the existing state-of-the-art techniques is well established

    Segmentation multi-vues d'objet

    No full text
    There has been a growing interest for multi-camera systems and many interesting works have tried to tackle computer vision problems in this particular configuration. The general objective is to propose new multi-view oriented methods instead of applying limited monocular approaches independently for each viewpoint. The work in this thesis is an attempt to have a better understanding of the multi-view object segmentation problem and to propose an alternative approach making maximum use of the available information from different viewpoints. Multiple view segmentation consists in segmenting objects simultaneously in several views. Classic monocular segmentation approaches reason on a single image and do not benefit from the presence of several viewpoints. A key issue in that respect is to ensure propagation of segmentation information between views while minimizing complexity and computational cost. In this work, we first investigate the idea that examining measurements at the projections of a sparse set of 3D points is sufficient to achieve this goal. The proposed algorithm softly assigns each of these 3D samples to the scene background if it projects on the background region in at least one view, or to the foreground if it projects on foreground region in all views. A complete probabilistic framework is proposed to estimate foreground/background color models and the method is tested on various datasets from state of the art. Two different extensions of the sparse 3D sampling segmentation framework are proposed in two scenarios. In the first, we show the flexibility of the sparse sampling framework, by using variational inference to integrate Gaussian mixture models as appearance models. In the second scenario, we propose a study of how to incorporate depth measurements in multi-view segmentation. We present a quantitative evaluation, showing that typical color-based segmentation robustness issues due to color-space ambiguity between foreground and background, can be at least partially mitigated by using depth, and that multi-view color depth segmentation also improves over monocular color depth segmentation strategies. The various tests also showed the limitations of the proposed 3D sparse sampling approach which was the motivation to propose a new method based on a richer description of image regions using superpixels. This model, that expresses more subtle relationships of the problem trough a graph construction linking superpixels and 3D samples, is one of the contributions of this work. In this new framework, time related information is also integrated. With static views, results compete with state of the art methods but they are achieved with significantly fewer viewpoints. Results on videos demonstrate the benefit of segmentation propagation through geometric and temporal cues. Finally, the last part of the thesis explores the possibilities of tracking in uncalibrated multi-view scenarios. A summary of existing methods in this field is presented, in both mono-camera and multi-camera scenarios. We investigate the potential of using self-similarity matrices to describe and compare motion in the context of multi-view tracking.L'utilisation de systèmes multi-caméras est de plus en plus populaire et il y a un intérêt croissant à résoudre les problèmes de vision par ordinateur dans ce contexte particulier. L'objectif étant de ne pas se limiter à l'application des méthodes monoculaires mais de proposer de nouvelles approches intrinsèquement orientées vers les systèmes multi-caméras. Le travail de cette thèse a pour objectif une meilleure compréhension du problème de segmentation multi-vues, pour proposer une nouvelle approche qui tire meilleur parti de la redondance d'information inhérente à l'utilisation de plusieurs points de vue. La segmentation multi-vues est l'identification de l'objet observé simultanément dans plusieurs caméras et sa séparation de l'arrière-plan. Les approches monoculaires classiques raisonnent sur chaque image de manière indépendante et ne bénéficient pas de la présence de plusieurs points de vue. Une question clé de la segmentation multi-vues réside dans la propagation d'information sur la segmentation entres les images tout en minimisant la complexité et le coût en calcul. Dans ce travail, nous investiguons en premier lieu l'utilisation d'un ensemble épars d'échantillons de points 3D. L'algorithme proposé classe chaque point comme "vide" s'il se projette sur une région du fond et "occupé" s'il se projette sur une région avant-plan dans toutes les vues. Un modèle probabiliste est proposé pour estimer les modèles de couleur de l'avant-plan et de l'arrière-plan, que nous testons sur plusieurs jeux de données de l'état de l'art. Deux extensions du modèle sont proposées. Dans la première, nous montrons la flexibilité de la méthode proposée en intégrant les mélanges de Gaussiennes comme modèles d'apparence. Cette intégration est possible grâce à l'utilisation de l'inférence variationelle. Dans la seconde, nous montrons que le modèle bayésien basé sur les échantillons 3D peut aussi être utilisé si des mesures de profondeur sont présentes. Les résultats de l'évaluation montrent que les problèmes de robustesse, typiquement causés par les ambigüités couleurs entre fond et forme, peuvent être au moins partiellement résolus en utilisant cette information de profondeur. A noter aussi qu'une approche multi-vues reste meilleure qu'une méthode monoculaire utilisant l'information de profondeur. Les différents tests montrent aussi les limitations de la méthode basée sur un échantillonnage éparse. Cela a montré la nécessité de proposer un modèle reposant sur une description plus riche de l'apparence dans les images, en particulier en utilisant les superpixels. L'une des contributions de ce travail est une meilleure modélisation des contraintes grâce à un schéma par coupure de graphes liant les régions d'images aux échantillons 3D. Dans le cas statique, les résultats obtenus rivalisent avec ceux de l'état de l'art mais sont obtenus avec beaucoup moins de points de vue. Les résultats dans le cas dynamique montrent l'intérêt de la propagation de l'information de segmentation à travers la géométrie et le mouvement. Enfin, la dernière partie de cette thèse explore la possibilité d'améliorer le suivi dans les systèmes multi-caméras non calibrés. Un état de l'art sur le suivi monoculaire et multi-caméras est présenté et nous explorons l'utilisation des matrices d'autosimilarité comme moyen de décrire le mouvement et de le comparer entre plusieurs caméras
    corecore