1,032 research outputs found
Lip contour extraction from color images using a deformable model
Abstract The use of visual information from lip movements can improve the accuracy and robustness of a speech recognition system. In this paper, a region-based lip contour extraction algorithm based on deformable model is proposed. The algorithm employs a stochastic cost function to partition a color lip image into lip and non-lip regions such that the joint probability of the two regions is maximized. Given a discrete probability map generated by spatial fuzzy clustering, we show how the optimization of the cost function can be done in the continuous setting. The region-based approach makes the algorithm more tolerant to noise and artifacts in the image. It also allows larger region of attraction, thus making the algorithm less sensitive to initial parameter settings. The algorithm works on unadorned lips and accurate extraction of lip contour is possible.
Improved facial feature fitting for model based coding and animation
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Lip Image Feature Extraction Utilizing Snakeâs Control Points for Lip Reading Applications
Snake is an active contour model that catches and locks image edges, then localizes them accurately. The simplest Snake consists of a set of control points that are connected by straight lines to form a closed loop. This paper discusses the application of Snake to find the visual feature of lip shapes. In most previous papers, visual feature of lip shapes is represented by Snakeâs contour. In this paper, the feature of lip shapes is represented by six control points on lip Snakeâs contours. By simply utilizing six control points representing one lip Snakeâs contour, it is expected to reduce the burden on pattern recognition stage. To demonstrate the performance of this method, some analysis has been conducted on the effect of lip conditions and illumination. The results shows that the overall lip feature extraction using the proposed method is better for lips that have more contrast to the surrounding skin, optimum room illumination that gives the best result is in the range of 330-340 lux
Dictionary-based lip reading classification
Visual lip reading recognition is an essential stage in many multimedia systems such as âAudio Visual Speech
Recognitionâ [6], âMobile Phone Visual System for deaf peopleâ, âSign Language Recognition Systemâ, etc.
The use of lip visual features to help audio or hand recognition is appropriate because this information is robust
to acoustic noise. In this paper, we describe our work towards developing a robust technique for lip reading
classification that extracts the lips in a colour image by using EMPCA feature extraction and k-nearest-neighbor
classification. In order to reduce the dimensionality of the feature space the lip motion is characterized by three
templates that are modelled based on different mouth shapes: closed template, semi-closed template, and wideopen
template. Our goal is to classify each image sequence based on the distribution of the three templates and
group the words into different clusters. The words that form the database were grouped into three different
clusters as follows: group1(âIâ, âhighâ, âlieâ, âhardâ, âcardâ, âbyeâ), group2(âyou, âoweâ, âwordâ), group3(âbirdâ)
Visual Speech Recognition
Lip reading is used to understand or interpret speech without hearing it, a
technique especially mastered by people with hearing difficulties. The ability
to lip read enables a person with a hearing impairment to communicate with
others and to engage in social activities, which otherwise would be difficult.
Recent advances in the fields of computer vision, pattern recognition, and
signal processing has led to a growing interest in automating this challenging
task of lip reading. Indeed, automating the human ability to lip read, a
process referred to as visual speech recognition (VSR) (or sometimes speech
reading), could open the door for other novel related applications. VSR has
received a great deal of attention in the last decade for its potential use in
applications such as human-computer interaction (HCI), audio-visual speech
recognition (AVSR), speaker recognition, talking heads, sign language
recognition and video surveillance. Its main aim is to recognise spoken word(s)
by using only the visual signal that is produced during speech. Hence, VSR
deals with the visual domain of speech and involves image processing,
artificial intelligence, object detection, pattern recognition, statistical
modelling, etc.Comment: Speech and Language Technologies (Book), Prof. Ivo Ipsic (Ed.), ISBN:
978-953-307-322-4, InTech (2011
Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems
A thesis submitted to the Faculty of Engineering and the Built Environment,
University of the Witwatersrand, Johannesburg, in ful lment of the requirements for
the degree of Doctor of Philosophy.
Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with
the resulting loss of speech. With recent advances in portable computing power,
automatic lip-reading (ALR) may become a viable approach to voice restoration. This
thesis addresses the image processing aspect of ALR, and focuses three contributions
to colour-based lip segmentation.
The rst contribution concerns the colour transform to enhance the contrast
between the lips and skin. This thesis presents the most comprehensive study to
date by measuring the overlap between lip and skin histograms for 33 di erent
colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%,
and results show that selecting the correct transform can increase the segmentation
accuracy by up to three times.
The second contribution is the development of a new lip segmentation algorithm
that utilises the best colour transforms from the comparative study. The algorithm
is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation
error (SE) of 7:39 %.
The third contribution focuses on the impact of the histogram threshold on the
segmentation accuracy, and introduces a novel technique called Adaptive Threshold
Optimisation (ATO) to select a better threshold value. The rst stage of ATO
incorporates -SVR to train the lip shape model. ATO then uses feedback of shape
information to validate and optimise the threshold. After applying ATO, the SE
decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp
or relative improvement of 15:1%. While this thesis concerns lip segmentation in
particular, ATO is a threshold selection technique that can be used in various
segmentation applications.MT201
Final Report to NSF of the Standards for Facial Animation Workshop
The human face is an important and complex communication channel. It is a very familiar and sensitive object of human perception. The facial animation field has increased greatly in the past few years as fast computer graphics workstations have made the modeling and real-time animation of hundreds of thousands of polygons affordable and almost commonplace. Many applications have been developed such as teleconferencing, surgery, information assistance systems, games, and entertainment. To solve these different problems, different approaches for both animation control and modeling have been developed
- âŠ