9 research outputs found

    Multimodal person recognition for human-vehicle interaction

    Get PDF
    Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies

    Extraction des traits caractéristiques du visage à l'aide de modèles paramétriques adaptés

    Get PDF
    - Dans cet article, nous nous intéressons à l'extraction automatique des contours des traits permanents du visage à savoir : les yeux, les sourcils et les lèvres. Pour chacun des traits considérés, un modèle paramétrique spécifique capable de rendre compte de toutes les déformations possibles est défini. Lors de la phase d'initialisation, des points caractéristiques du visage sont extraits (coins des yeux et de la bouche par exemple) et servent de points d'ancrage initiaux pour chacun des modèles. Dans la phase d'évolution, chaque modèle est déformé afin de plaquer au mieux sur les contours des traits présents dans le visage analysé. Cette déformation se fait par maximisation d'un flux de gradient (de luminance et/ou de chrominance) le long des contours définis par chaque courbe du modèle. La définition de modèles permet d'introduire naturellement une contrainte de régularisation sur les contours recherchés. Mais, les modèles choisis restent suffisamment flexibles pour permettre une extraction réaliste des contours des yeux, des sourcils et de la bouche. L'extraction précise des contours des principaux traits du visage constitue la première étape d'un système de reconnaissance des dynamiques émotionnelles

    Face Detection And Lip Localization

    Get PDF
    Integration of audio and video signals for automatic speech recognition has become an important field of study. The Audio-Visual Speech Recognition (AVSR) system is known to have accuracy higher than audio-only or visual-only system. The research focused on the visual front end and has been centered around lip segmentation. Experiments performed for lip feature extraction were mainly done in constrained environment with controlled background noise. In this thesis we focus our attention to a database collected in the environment of a moving car which hampered the quality of the imagery. We first introduce the concept of illumination compensation, where we try to reduce the dependency of light from over- or under-exposed images. As a precursor to lip segmentation, we focus on a robust face detection technique which reaches an accuracy of 95%. We have detailed and compared three different face detection techniques and found a successful way of concatenating them in order to increase the overall accuracy. One of the detection techniques used was the object detection algorithm proposed by Viola-Jones. We have experimented with different color spaces using the Viola-Jones algorithm and have reached interesting conclusions. Following face detection we implement a lip localization algorithm based on the vertical gradients of hybrid equations of color. Despite the challenging background and image quality, success rate of 88% was achieved for lip segmentation

    Visual Speech Recognition

    Get PDF
    Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and to engage in social activities, which otherwise would be difficult. Recent advances in the fields of computer vision, pattern recognition, and signal processing has led to a growing interest in automating this challenging task of lip reading. Indeed, automating the human ability to lip read, a process referred to as visual speech recognition (VSR) (or sometimes speech reading), could open the door for other novel related applications. VSR has received a great deal of attention in the last decade for its potential use in applications such as human-computer interaction (HCI), audio-visual speech recognition (AVSR), speaker recognition, talking heads, sign language recognition and video surveillance. Its main aim is to recognise spoken word(s) by using only the visual signal that is produced during speech. Hence, VSR deals with the visual domain of speech and involves image processing, artificial intelligence, object detection, pattern recognition, statistical modelling, etc.Comment: Speech and Language Technologies (Book), Prof. Ivo Ipsic (Ed.), ISBN: 978-953-307-322-4, InTech (2011

    Designing a Visual Front End in Audio-Visual Automatic Speech Recognition System

    Get PDF
    Audio-visual automatic speech recognition (AVASR) is a speech recognition technique integrating audio and video signals as input. Traditional audio-only speech recognition system only uses acoustic information from an audio source. However the recognition performance degrades significantly in acoustically noisy environments. It has been shown that visual information also can be used to identify speech. To improve the speech recognition performance, audio-visual automatic speech recognition has been studied. In this paper, we focus on the design of the visual front end of an AVASR system, which mainly consists of face detection and lip localization. The front end is built upon the AVICAR database that was recorded in moving vehicles. Therefore, diverse lighting conditions and poor quality of imagery are the problems we must overcome. We first propose the use of the Viola-Jones face detection algorithm that can process images rapidly with high detection accuracy. When the algorithm is applied to the AVICAR database, we reach an accuracy of 89% face detection rate. By separately detecting and integrating the detection results from all different color channels, we further improve the detection accuracy to 95%. To reliably localize the lips, three algorithms are studied and compared: the Gabor filter algorithm, the lip enhancement algorithm, and the modified Viola-Jones algorithm for lip features. Finally, to increase detection rate, a modified Viola-Jones algorithm and lip enhancement algorithms are cascaded based on the results of three lip localization methods. Overall, the front end achieves an accuracy of 90% for lip localization

    Parametric models for facial features segmentation

    Get PDF
    In this paper, we are dealing with the problem of facial features segmentation (mouth, eyes and eyebrows). A specific parametric model is defined for each feature, each model being able to take into account all the possible deformations. In order to initialize each model, some characteristic points are extracted on each image to be processed (for example, the corners of the eyes, mouth and eyebrows). In order to fit the model with the contours to be extracted, a gradient flow (of luminance or chrominance ) through the estimated contour is maximized because at each point of the searched contour, the gradient (of luminance or chrominance) is normal. The advantage of the definition of a model associated to each feature is to be able to introduce a regularisation constraint. However, the chosen models are flexible enough in order to produce realistic contours for the mouth, the eyes and eyebrows. This facial features segmentation is the first step of a set of multi-media applications.Dans cet article, nous nous intéressons à l’extraction automatique des contours des traits permanents du visage à savoir : les yeux, les sourcils et les lèvres. Pour chacun des traits considérés, un modèle paramétrique spécifique capable de rendre compte de toutes les déformations possibles est défini. Lors de la phase d’initialisation, des points caractéristiques du visage sont extraits (coins des yeux et de la bouche par exemple) et servent de points d’ancrage initiaux pour chacun des modèles. Dans la phase d’évolution, chaque modèle est déformé afin de coïncider au mieux avec les contours des traits présents sur le visage analysé. Cette déformation se fait par maximisation d’un flux de gradient (de luminance et/ou de chrominance) le long des contours définis par chaque courbe du modèle. La définition de modèles permet d’introduire naturellement une contrainte de régularisation sur les contours recherchés. Néanmoins, les modèles choisis restent suffisamment flexibles pour permettre une extraction réaliste des contours des yeux, des sourcils et de la bouche. L’extraction précise des contours des principaux traits du visage constitue la première étape d’un ensemble d’applications multimédia

    Statistical facial feature extraction and lip segmentation

    Get PDF
    Facial features such as lip corners, eye corners and nose tip are critical points in a human face. Robust extraction of such facial feature locations is an important problem which is used in a wide range of applications including audio-visual speech recognition, human-computer interaction, emotion recognition, fatigue detection and gesture recognition. In this thesis, we develop a probabilistic method for facial feature extraction. This technique is able to automatically learn location and texture information of facial features from a training set. Facial feature locations are extracted from face regions using joint distributions of locations and textures represented with mixtures of Gaussians. This formulation results in a maximum likelihood (ML) optimization problem which can be solved using either a gradient ascent or Newton type algorithm. Extracted lip corner locations are then used to initialize a lip segmentation algorithm to extract the lip contours. We develop a level-set based method that utilizes adaptive color distributions and shape priors for lip segmentation. More precisely, an implicit curve representation which learns the color information of lip and non-lip points from a training set is employed. The model can adapt itself to the image of interest using a coarse elliptical region. Extracted lip contour provides detailed information about the lip shape. Both methods are tested using different databases for facial feature extraction and lip segmentation. It is shown that the proposed methods achieve better results compared to conventional methods. Our facial feature extraction method outperforms the active appearance models in terms of pixel errors, while our lip segmentation method outperforms region based level-set curve evolutions in terms of precision and recall results
    corecore