15 research outputs found
Lip Motion Pattern Recognition for Indonesian Syllable Pronunciation Utilizing Hidden Markov Model Method
A speech therapeutic tool had been developed to help Indonesian deaf kids learn how to pronounce words correctly. The applied technique utilized lip movement frames captured by a camera and inputted them in to a pattern recognition module which can differentiate between different vowel phonemes pronunciation in Indonesian language. In this paper, we used one dimensional Hidden Markov Model (HMM) method for pattern recognition module. The feature used for the training and test data were composed of six key-points of 20 sequential frames representing certain phonemes. Seventeen Indonesian phonemes were chosen from the words usually used by deaf kid special school teachers for speech therapy. The results showed that the recognition rates varied on different phonemes articulation, ie. 78% for bilabial/palatal phonemes and 63% for palatal only phonemes. The condition of the lips also had effect on the result, where female with red lips has 0.77 correlation coefficient, compare to 0.68 for pale lips and 0.38 for male with mustaches
EXPERIMENTAL STUDY ON LIP AND SMILE DETECTION
This paper presents a lip and smile detection method based-on the normalized RGB chromaticity diagram. The method employs the popular Viola-Jones detection method to detect the face. To avoid the false positive, the eye detector is introduced in the detection stage. Only the face candidates with the detected eyes are considered as the face. Once the face is detected, the lip region is localized using the simple geometric rule. Further, the the red color thresholding based-on the normalized RGB chromaticity diagram is proposed to extract the lip. The projection technique is employed for detecting the smile state. From the experiment results, the proposed method achieves the lip detection rate of 97% and the smile detection rate of 94%.
Paper ini menyajikan medote pendeteksi bibir dan senyum berdasarkan diagram tingkat kromatis RGB ternormalisasi. Metode ini menggunakan metode Viola-Jones yang populer untuk mendeteksi wajah. Untuk menghindari kesalahan positif, detektor mata diperkenalkan pada tahapan deteksi. Hanya kandidat wajah dengan mata yang telah terdeteksi yang dianggap sebagai wajah. Setelah wajah dideteksi, bagian bibir ditempatkan dengan menggunakan aturan geometris sederhana. Selanjutnya, batasan warna merah berdasarkan pada diagram kromatisitas RGB ternormalisasi digunakan untuk mengekstrak bibir. Teknik proyeksi digunakan untuk mendeteksi keadaan tersenyum. Dari hasil percobaan, metode yang diusulkan mencapai 97% untuk tingkat deteksi bibir dan 94% untuk tingkat deteksi senyum
Структурно-віземний аналіз артикуляції українського мовлення
У статті пропонується підхід до структурно-віземного аналізу візуальної складової мовленнєвого процесу
у відеопотоці. Підхід дозволяє отримувати інформацію про кількісну присутність візем з заданого
базового набору на кадрі анімації при обчисленні параметрів оптимального стану тривимірної моделі
голови людини. Проведені експериментальні дослідження показали можливість використання запропонованої
моделі для ідентифікації базових станів губ при артикуляції на тестовій вибірці відеофрагментів 185 слів
української мови.An approach to the structural analysis of visemes of visual component of speech process in the video stream
is proposed in this paper. The approach allows to compute numeric information about presence of a viseme in
an animation frame chosen from the given base set by calculating the optimal parameters of state for threedimensional
model of a human head. Experimental studies have shown the efficiency of using the proposed
model to identify the basic states of lip articulation by test video samples with 185 words of the Ukrainian
language
A new visual speech modelling approach for visual speech recognition
In this paper we propose a new learning-based representation that is referred to as Visual Speech Unit (VSU) for visual speech recognition (VSR). The new Visual Speech Unit concept proposes an extension of the standard viseme model that is currently applied for VSR by including in this representation not only the data associated with the visemes, but also the transitory information between consecutive visemes. The developed speech recognition system consists of several computational stages: (a) lips segmentation, (b) construction of the Expectation-Maximization Principal Component Analysis (EM-PCA) manifolds from the input video image, (c) registration between the models of the VSUs and the EM-PCA data constructed from the input image sequence and (d) recognition of the VSUs using a standard Hidden Markov Model (HMM) classification scheme. In this paper we were particularly interested to evaluate the classification accuracy obtained for our new VSU models when compared with that attained for standard (MPEG-4) viseme models. The experimental results indicate that we achieved 90% recognition rate when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 52%
Robust Visual Lips Feature Extraction Method for Improved Visual Speech Recognition System
Recently, automatic lips reading ALR acquired a significant interest among many researchers due to its adoption in many applications. One such application is in speech recognition system in noisy environment, where visual cue that contain some integral information added to the audio signal, as well as the way that person merges audio-visual stimulus to identify utterance. The unsolved part of this problem is the utterance classification using only the visual cues without the availability of acoustic signal of the talker's speech. By taking into considerations a set of frames from recorded video for a person uttering a word; a robust image processing technique is used to isolate the lips region, then suitable features are extracted that represent the mouth shape variation during speech. These features are used by the classification stage to identify the uttered word. This paper is solve this problem by introducing a new segmentation technique to isolate the lips region together with a set of visual features base on the extracted lips boundary which able to perform lips reading with significant result. A special laboratory is designed to collect the utterance of twenty six English letters from a multiple speakers which are adopted in this paper (UOTEletters corpus). Moreover; two type of classifier (using Numeral Virtual generalization (NVG) RAM and K nearest neighborhood KNN) where adopted to identify the talker’s utterance. The recognition performance for the input visual utterance when using NVG RAM is 94.679%, which is utilized for the first time in this work. While; 92.628% when KNN is utilize
Visual Speech Recognition
Lip reading is used to understand or interpret speech without hearing it, a
technique especially mastered by people with hearing difficulties. The ability
to lip read enables a person with a hearing impairment to communicate with
others and to engage in social activities, which otherwise would be difficult.
Recent advances in the fields of computer vision, pattern recognition, and
signal processing has led to a growing interest in automating this challenging
task of lip reading. Indeed, automating the human ability to lip read, a
process referred to as visual speech recognition (VSR) (or sometimes speech
reading), could open the door for other novel related applications. VSR has
received a great deal of attention in the last decade for its potential use in
applications such as human-computer interaction (HCI), audio-visual speech
recognition (AVSR), speaker recognition, talking heads, sign language
recognition and video surveillance. Its main aim is to recognise spoken word(s)
by using only the visual signal that is produced during speech. Hence, VSR
deals with the visual domain of speech and involves image processing,
artificial intelligence, object detection, pattern recognition, statistical
modelling, etc.Comment: Speech and Language Technologies (Book), Prof. Ivo Ipsic (Ed.), ISBN:
978-953-307-322-4, InTech (2011