Search CORE

5,627 research outputs found

Lip segmentation using adaptive color space training

Author: Erdogan Hakan
Erdoğan Hakan
Karabalkan Harun
Ozgur Erol
Unel Mustafa
Yılmaz Mustafa Berkay
Yilmaz Mustafa Berkay
Özgür Erol
Ünel Mustafa
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/09/2008
Field of study

In audio-visual speech recognition (AVSR), it is beneficial to use lip boundary information in addition to texture-dependent features. In this paper, we propose an automatic lip segmentation method that can be used in AVSR systems. The algorithm consists of the following steps: face detection, lip corners extraction, adaptive color space training for lip and non-lip regions using Gaussian mixture models (GMMs), and curve evolution using level-set formulation based on region and image gradients fields. Region-based fields are obtained using adapted GMM likelihoods. We have tested the proposed algorithm on a database (SU-TAV) of 100 facial images and obtained objective performance results by comparing automatic lip segmentations with hand-marked ground truth segmentations. Experimental results are promising and much work has to be done to improve the robustness of the proposed method

Reproducibility of the dynamics of facial expressions in unilateral facial palsy

Author: Alagha Mahmoud Amir
Ayoub Ashraf
Ju Xiangyang
Morley Stephen
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

The aim of this study was to assess the reproducibility of non-verbal facial expressions in unilateral facial paralysis using dynamic four-dimensional (4D) imaging. The Di4D system was used to record five facial expressions of 20 adult patients. The system captured 60 three-dimensional (3D) images per second; each facial expression took 3–4 seconds which was recorded in real time. Thus a set of 180 3D facial images was generated for each expression. The procedure was repeated after 30 min to assess the reproducibility of the expressions. A mathematical facial mesh consisting of thousands of quasi-point ‘vertices’ was conformed to the face in order to determine the morphological characteristics in a comprehensive manner. The vertices were tracked throughout the sequence of the 180 images. Five key 3D facial frames from each sequence of images were analyzed. Comparisons were made between the first and second capture of each facial expression to assess the reproducibility of facial movements. Corresponding images were aligned using partial Procrustes analysis, and the root mean square distance between them was calculated and analyzed statistically (paired Student ttest, P < 0.05). Facial expressions of lip purse, cheek puff, and raising of eyebrows were reproducible. Facial expressions of maximum smile and forceful eye closure were not reproducible. The limited coordination of various groups of facial muscles contributed to the lack of reproducibility of these facial expressions. 4D imaging is a useful clinical tool for the assessment of facial expressions

Enlighten

Dictionary-based lip reading classification

Author: Ghita Ovidiu
Sutherland Alistair
Whelan Paul F.
Yu Dahai
Publication venue
Publication date: 01/01/2006
Field of study

Visual lip reading recognition is an essential stage in many multimedia systems such as “Audio Visual Speech Recognition” [6], “Mobile Phone Visual System for deaf people”, “Sign Language Recognition System”, etc. The use of lip visual features to help audio or hand recognition is appropriate because this information is robust to acoustic noise. In this paper, we describe our work towards developing a robust technique for lip reading classification that extracts the lips in a colour image by using EMPCA feature extraction and k-nearest-neighbor classification. In order to reduce the dimensionality of the feature space the lip motion is characterized by three templates that are modelled based on different mouth shapes: closed template, semi-closed template, and wideopen template. Our goal is to classify each image sequence based on the distribution of the three templates and group the words into different clusters. The words that form the database were grouped into three different clusters as follows: group1(‘I’, ‘high’, ‘lie’, ‘hard’, ‘card’, ‘bye’), group2(‘you, ‘owe’, ‘word’), group3(‘bird’)

CiteSeerX

Automatic lip tracking: Bayesian segmentation and active contours in a cooperative scheme

Author: Coulon Pierre-Yves
Delmas Patrice
Fristot Vincent
Liévin Marc
Luthon Franck
Publication venue: HAL CCSD
Publication date: 07/06/1999
Field of study

International audienceAn algorithm for speaker's lip contour extraction is pre- sented in this paper. A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up. First, a logarithmic color transform is performed from RGB to HI (hue, intensity) color space. A bayesian approach segments the mouth area using Markov random ﬁeld modelling. Motion is combined with red hue lip information into a spatiotemporal neighbourhood. Simultaneously, a Region Of Interest and relevant boundaries points are automatically extracted. Next, an active contour using spatially varying coefﬁcients is initialised with the results of the preprocessing stage. Finally, an accurate lip shape with inner and outer borders is obtained with good quality results in this challenging situation

Hal - Université Grenoble Alpes

Automatic detection of ADHD and ASD from expressive behaviour in RGBD data

Author: Daley David
Gillott Alinda
Jaiswal Shashank
Valstar Michel F.
Publication venue
Publication date: 07/12/2016
Field of study

Attention Deficit Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD) are neurodevelopmental conditions which impact on a significant number of children and adults. Currently, the diagnosis of such disorders is done by experts who employ standard questionnaires and look for certain behavioural markers through manual observation. Such methods for their diagnosis are not only subjective, difficult to repeat, and costly but also extremely time consuming. In this work, we present a novel methodology to aid diagnostic predictions about the presence/absence of ADHD and ASD by automatic visual analysis of a person's behaviour. To do so, we conduct the questionnaires in a computer-mediated way while recording participants with modern RGBD (Colour+Depth) sensors. In contrast to previous automatic approaches which have focussed only on detecting certain behavioural markers, our approach provides a fully automatic end-to-end system to directly predict ADHD and ASD in adults. Using state of the art facial expression analysis based on Dynamic Deep Learning and 3D analysis of behaviour, we attain classification rates of 96% for Controls vs Condition (ADHD/ASD) groups and 94% for Comorbid (ADHD+ASD) vs ASD only group. We show that our system is a potentially useful time saving contribution to the clinical diagnosis of ADHD and ASD

arXiv.org e-Print Archive

Automatic Video Self Modeling for Voice Disorder

Author: Cheung Sen-ching S.
Patel Rita
Raghunathan Anusha
Shen Ju
Ti Changpeng
Publication venue: eCommons
Publication date: 01/07/2015
Field of study

Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient’s original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Results of both objective measurements and subjective evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques

A PCA based manifold representation for visual speech recognition

Author: Ghita Ovidiu
Sutherland Alistair
Whelan Paul F.
Yu Dahai
Publication venue
Publication date: 01/08/2007
Field of study

In this paper, we discuss a new Principal Component Analysis (PCA)-based manifold representation for visual speech recognition. In this regard, the real time input video data is compressed using Principal Component Analysis and the low-dimensional points calculated for each frame define the manifold. Since the number of frames that form the video sequence is dependent on the word complexity, in order to use these manifolds for visual speech classification it is required to re-sample them into a fixed pre-defined number of key-points. These key-points are used as input for a Hidden Markov Model (HMM) classification scheme. We have applied the developed visual speech recognition system to a database containing a group of English words and the experimental data indicates that the proposed approach is able to produce accurate classification results

A study of lip movements during spontaneous dialog and its application to voice activity detection

Author: Girin Laurent
Jutten Christian
Rivet Bertrand
Savariaux Christophe
Schwartz Jean-Luc
Sodoyer David
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2009
Field of study

International audienceThis paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences i.e., when no sound is produced by the speaker . The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector VAD . To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence i.e., speech+nonspeech audible events . A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals

Hal - Université Grenoble Alpes