15 research outputs found

    Lip Motion Pattern Recognition for Indonesian Syllable Pronunciation Utilizing Hidden Markov Model Method

    Get PDF
    A speech therapeutic tool had been developed to help Indonesian deaf kids learn how to pronounce words correctly. The applied technique utilized lip movement frames captured by a camera and inputted them in to a pattern recognition module which can differentiate between different vowel phonemes pronunciation in Indonesian language. In this paper, we used one dimensional Hidden Markov Model (HMM) method for pattern recognition module. The feature used for the training and test data were composed of six key-points of 20 sequential frames representing certain phonemes. Seventeen Indonesian phonemes were chosen from the words usually used by deaf kid special school teachers for speech therapy. The results showed that the recognition rates varied on different phonemes articulation, ie. 78% for bilabial/palatal phonemes and 63% for palatal only phonemes. The condition of the lips also had effect on the result, where female with red lips has 0.77 correlation coefficient, compare to 0.68 for pale lips and 0.38 for male with mustaches

    EXPERIMENTAL STUDY ON LIP AND SMILE DETECTION

    Get PDF
    This paper presents a lip and smile detection method based-on the normalized RGB chromaticity diagram. The method employs the popular Viola-Jones detection method to detect the face. To avoid the false positive, the eye detector is introduced in the detection stage. Only the face candidates with the detected eyes are considered as the face. Once the face is detected, the lip region is localized using the simple geometric rule. Further, the the red color thresholding based-on the normalized RGB chromaticity diagram is proposed to extract the lip. The projection technique is employed for detecting the smile state. From the experiment results, the proposed method achieves the lip detection rate of 97% and the smile detection rate of 94%. Paper ini menyajikan medote pendeteksi bibir dan senyum berdasarkan diagram tingkat kromatis RGB ternormalisasi. Metode ini menggunakan metode Viola-Jones yang populer untuk mendeteksi wajah. Untuk menghindari kesalahan positif, detektor mata diperkenalkan pada tahapan deteksi. Hanya kandidat wajah dengan mata yang telah terdeteksi yang dianggap sebagai wajah. Setelah wajah dideteksi, bagian bibir ditempatkan dengan menggunakan aturan geometris sederhana. Selanjutnya, batasan warna merah berdasarkan pada diagram kromatisitas RGB ternormalisasi digunakan untuk mengekstrak bibir. Teknik proyeksi digunakan untuk mendeteksi keadaan tersenyum. Dari hasil percobaan, metode yang diusulkan mencapai 97% untuk tingkat deteksi bibir dan 94% untuk tingkat deteksi senyum

    Структурно-віземний аналіз артикуляції українського мовлення

    Get PDF
    У статті пропонується підхід до структурно-віземного аналізу візуальної складової мовленнєвого процесу у відеопотоці. Підхід дозволяє отримувати інформацію про кількісну присутність візем з заданого базового набору на кадрі анімації при обчисленні параметрів оптимального стану тривимірної моделі голови людини. Проведені експериментальні дослідження показали можливість використання запропонованої моделі для ідентифікації базових станів губ при артикуляції на тестовій вибірці відеофрагментів 185 слів української мови.An approach to the structural analysis of visemes of visual component of speech process in the video stream is proposed in this paper. The approach allows to compute numeric information about presence of a viseme in an animation frame chosen from the given base set by calculating the optimal parameters of state for threedimensional model of a human head. Experimental studies have shown the efficiency of using the proposed model to identify the basic states of lip articulation by test video samples with 185 words of the Ukrainian language

    A new visual speech modelling approach for visual speech recognition

    Get PDF
    In this paper we propose a new learning-based representation that is referred to as Visual Speech Unit (VSU) for visual speech recognition (VSR). The new Visual Speech Unit concept proposes an extension of the standard viseme model that is currently applied for VSR by including in this representation not only the data associated with the visemes, but also the transitory information between consecutive visemes. The developed speech recognition system consists of several computational stages: (a) lips segmentation, (b) construction of the Expectation-Maximization Principal Component Analysis (EM-PCA) manifolds from the input video image, (c) registration between the models of the VSUs and the EM-PCA data constructed from the input image sequence and (d) recognition of the VSUs using a standard Hidden Markov Model (HMM) classification scheme. In this paper we were particularly interested to evaluate the classification accuracy obtained for our new VSU models when compared with that attained for standard (MPEG-4) viseme models. The experimental results indicate that we achieved 90% recognition rate when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 52%

    Robust Visual Lips Feature Extraction Method for Improved Visual Speech Recognition System

    Get PDF
    Recently, automatic lips reading ALR acquired a significant interest among many researchers due to its adoption in many applications. One such application is in speech recognition system in noisy environment, where visual cue that contain some integral information added to the audio signal, as well as the way that person merges audio-visual stimulus to identify utterance. The unsolved part of this problem is the utterance classification using only the visual cues without the availability of acoustic signal of the talker's speech. By taking into considerations a set of frames from recorded video for a person uttering a word; a robust image processing technique is used to isolate the lips region, then suitable features are extracted that represent the mouth shape variation during speech. These features are used by the classification stage to identify the uttered word. This paper is solve this problem by introducing a new segmentation technique to isolate the lips region together with a set of visual features base on the extracted lips boundary which able to perform lips reading with significant result. A special laboratory is designed to collect the utterance of twenty six English letters from a multiple speakers which are adopted in this paper (UOTEletters corpus). Moreover; two type of classifier (using Numeral Virtual generalization (NVG) RAM and K nearest neighborhood KNN) where adopted to identify the talker’s utterance. The recognition performance for the input visual utterance when using NVG RAM is 94.679%, which is utilized for the first time in this work. While; 92.628% when KNN is utilize

    Visual Speech Recognition

    Get PDF
    Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and to engage in social activities, which otherwise would be difficult. Recent advances in the fields of computer vision, pattern recognition, and signal processing has led to a growing interest in automating this challenging task of lip reading. Indeed, automating the human ability to lip read, a process referred to as visual speech recognition (VSR) (or sometimes speech reading), could open the door for other novel related applications. VSR has received a great deal of attention in the last decade for its potential use in applications such as human-computer interaction (HCI), audio-visual speech recognition (AVSR), speaker recognition, talking heads, sign language recognition and video surveillance. Its main aim is to recognise spoken word(s) by using only the visual signal that is produced during speech. Hence, VSR deals with the visual domain of speech and involves image processing, artificial intelligence, object detection, pattern recognition, statistical modelling, etc.Comment: Speech and Language Technologies (Book), Prof. Ivo Ipsic (Ed.), ISBN: 978-953-307-322-4, InTech (2011
    corecore