4 research outputs found

    Lip Motion Pattern Recognition for Indonesian Syllable Pronunciation Utilizing Hidden Markov Model Method

    Get PDF
    A speech therapeutic tool had been developed to help Indonesian deaf kids learn how to pronounce words correctly. The applied technique utilized lip movement frames captured by a camera and inputted them in to a pattern recognition module which can differentiate between different vowel phonemes pronunciation in Indonesian language. In this paper, we used one dimensional Hidden Markov Model (HMM) method for pattern recognition module. The feature used for the training and test data were composed of six key-points of 20 sequential frames representing certain phonemes. Seventeen Indonesian phonemes were chosen from the words usually used by deaf kid special school teachers for speech therapy. The results showed that the recognition rates varied on different phonemes articulation, ie. 78% for bilabial/palatal phonemes and 63% for palatal only phonemes. The condition of the lips also had effect on the result, where female with red lips has 0.77 correlation coefficient, compare to 0.68 for pale lips and 0.38 for male with mustaches

    ViLiDEx- A Lip Extraction Algorithm for Lip Reading

    Get PDF
    Technology is evolving at an immense speed every day. In the lap of technology, computer vision and machine learning are also growing fast. Many real time applications are running without human interaction just because of Computer vision and machine learning. In this paper, we are using computer vision and machine learning for lip feature extraction for Gujarati language. For this task we have created dataset GVLetters for Gujarati alphabets. We have taken videos of 24 speakers for 33 alphabets of Guajarati language. Face landmark algorithm from dlib is used for deriving ViLiDEx (Vibhavari’s algorithm for Lip Detection and Extraction). ViLiDEx is applied for 24 speakers and 5 alphabets from each class (Guttural, Palatal, Retroflex, Dental and Labial). This algorithm calculates total number of frames for each speaker, keep 20/25 frames as a dataset and removes extra frames. Depending on number of frames, frame numbers divisible by prime numbers are chosen for removal
    corecore