6,149 research outputs found

    A Systematic Study and Empirical Analysis of Lip Reading Models using Traditional and Deep Learning Algorithms

    Get PDF
    Despite the fact that there are many applications for analyzing and recreating the audio through existinglip movement recognition, the researchers have shown the interest in developing the automatic lip-readingsystems to achieve the increased performance. Modelling of the framework has been playing a major role inadvance yield of sequential framework. In recent years there have been lot of interest in Deep Neural Networks(DNN) and break through results in various domains including Image Classification, Speech Recognition andNatural Language Processing. To represents complex functions DNNs are used and also they play a vital rolein Automatic Lip Reading (ALR) systems. This paper mainly focuses on the traditional pixel, shape and mixedfeature extractions and their improved technologies for lip reading recognitions. It highlights the mostimportant techniques and progression from end-to-end deep learning architectures that were evolved duringthe past decade. The investigation points out the voice-visual databases that are used for analyzing and trainthe system with the most common words and the count of speakers and the size, length of the language andtime duration. On the flip side, ALR systems developed were compared with their old-style systems. Thestatistical analysis is performed to recognize the characters or numerals and words or sentences in English andcompared their performances

    Lip Motion Pattern Recognition for Indonesian Syllable Pronunciation Utilizing Hidden Markov Model Method

    Get PDF
    A speech therapeutic tool had been developed to help Indonesian deaf kids learn how to pronounce words correctly. The applied technique utilized lip movement frames captured by a camera and inputted them in to a pattern recognition module which can differentiate between different vowel phonemes pronunciation in Indonesian language. In this paper, we used one dimensional Hidden Markov Model (HMM) method for pattern recognition module. The feature used for the training and test data were composed of six key-points of 20 sequential frames representing certain phonemes. Seventeen Indonesian phonemes were chosen from the words usually used by deaf kid special school teachers for speech therapy. The results showed that the recognition rates varied on different phonemes articulation, ie. 78% for bilabial/palatal phonemes and 63% for palatal only phonemes. The condition of the lips also had effect on the result, where female with red lips has 0.77 correlation coefficient, compare to 0.68 for pale lips and 0.38 for male with mustaches

    Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

    Get PDF
    Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages.  This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems

    Analysis and Construction of Engaging Facial Forms and Expressions: Interdisciplinary Approaches from Art, Anatomy, Engineering, Cultural Studies, and Psychology

    Get PDF
    The topic of this dissertation is the anatomical, psychological, and cultural examination of a human face in order to effectively construct an anatomy-driven 3D virtual face customization and action model. In order to gain a broad perspective of all aspects of a face, theories and methodology from the fields of art, engineering, anatomy, psychology, and cultural studies have been analyzed and implemented. The computer generated facial customization and action model were designed based on the collected data. Using this customization system, culturally-specific attractive face in Korean popular culture, “kot-mi-nam (flower-like beautiful guy),” was modeled and analyzed as a case study. The “kot-mi-nam” phenomenon is overviewed in textual, visual, and contextual aspects, which reveals the gender- and sexuality-fluidity of its masculinity. The analysis and the actual development of the model organically co-construct each other requiring an interwoven process. Chapter 1 introduces anatomical studies of a human face, psychological theories of face recognition and an attractive face, and state-of-the-art face construction projects in the various fields. Chapter 2 and 3 present the Bezier curve-based 3D facial customization (BCFC) and Multi-layered Facial Action Model (MFAF) based on the analysis of human anatomy, to achieve a cost-effective yet realistic quality of facial animation without using 3D scanned data. In the experiments, results for the facial customization for gender, race, fat, and age showed that BCFC achieved enhanced performance of 25.20% compared to existing program Facegen , and 44.12% compared to Facial Studio. The experimental results also proved the realistic quality and effectiveness of MFAM compared with blend shape technique by enhancing 2.87% and 0.03% of facial area for happiness and anger expressions per second, respectively. In Chapter 4, according to the analysis based on BCFC, the 3D face of an average kot-mi-nam is close to gender neutral (male: 50.38%, female: 49.62%), and Caucasian (66.42-66.40%). Culturally-specific images can be misinterpreted in different cultures, due to their different languages, histories, and contexts. This research demonstrates that facial images can be affected by the cultural tastes of the makers and can also be interpreted differently by viewers in different cultures

    Sign Language Recognition

    Get PDF
    This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief introduction to the motivations and requirements, followed by a précis of sign linguistics and their impact on the field. The types of data available and the relative merits are explored allowing examination of the features which can be extracted. Classifying the manual aspects of sign (similar to gestures) is then discussed from a tracking and non-tracking viewpoint before summarising some of the approaches to the non-manual aspects of sign languages. Methods for combining the sign classification results into full SLR are given showing the progression towards speech recognition techniques and the further adaptations required for the sign specific case. Finally the current frontiers are discussed and the recent research presented. This covers the task of continuous sign recognition, the work towards true signer independence, how to effectively combine the different modalities of sign, making use of the current linguistic research and adapting to larger more noisy data set

    Augmented Reality Talking Heads as a Support for Speech Perception and Production

    Get PDF

    Improving Letter Recognition and Reading in Peripheral Vision: Sensory and Cognitive Constraints

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2017. Major: Psychology. Advisors: Gordon Legge, Sheng He. 1 computer file (PDF); x, 134 pages + 1 mp4 video file.Reading is an important daily task, but it is very difficult for people who have lost their central vision, because they must use peripheral vision to read. One hypothesis for slow reading speed in peripheral vision is the shrinkage of the visual span, which is the number of identifiable letters within a glimpse. Previous studies have shown that perceptual training tasks of letter recognition can enlarge peripheral visual span, as well as improving peripheral reading speed by 40% or more. This thesis focuses on sensory and cognitive factors that facilitate or limit the training-related improvements, with an ultimate goal of developing rehabilitation protocols for people with central-field loss. Chapter 1 gives an overview of the thesis. Chapter 2 demonstrates that there are common constraints limiting the size of the visual span across languages (Korean and English), and that extensive training of reading Korean characters using peripheral vision enlarges Korean visual span as well as English visual span. This transfer of training suggests a pre-symbolic nature of the visual span, and a strong potential for training benefits to generalize to untrained scripts. Chapter 3 discusses visual crowding, the inability to recognize objects in clutter, which is proposed to be the major sensory factor limiting the size of the visual span and reading. The results lead to the conclusion that reducing the impact of crowding can enlarge the visual span and can potentially facilitate reading, but not when adverse attentional bias is introduced, for example directing attention to one specific, small area in the visual field. By dissociating the influence of sensory and attentional factors, the link between crowding, visual span and reading was clarified. Finally, Chapter 4 reports on a study where the training was implemented in a word-guessing video game. The game training successfully enlarged the visual span and improved reading speed. Embedding the training in a game enhanced the enjoyment of the training and can temporarily boost letter-recognition performance during the game, but the quality of the training was not altered compared with similar training without the game. Together, the studies presented in this thesis not only speak to the theoretical basis for the training-related changes, but also provide practical guidance for designing potential reading rehabilitation protocols for people with central-field-loss
    • …
    corecore