6 research outputs found

    Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

    Get PDF
    A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications.MT201

    The application of manifold based visual speech units for visual speech recognition

    Get PDF
    This dissertation presents a new learning-based representation that is referred to as a Visual Speech Unit for visual speech recognition (VSR). The automated recognition of human speech using only features from the visual domain has become a significant research topic that plays an essential role in the development of many multimedia systems such as audio visual speech recognition(AVSR), mobile phone applications, human-computer interaction (HCI) and sign language recognition. The inclusion of the lip visual information is opportune since it can improve the overall accuracy of audio or hand recognition algorithms especially when such systems are operated in environments characterized by a high level of acoustic noise. The main contribution of the work presented in this thesis is located in the development of a new learning-based representation that is referred to as Visual Speech Unit for Visual Speech Recognition (VSR). The main components of the developed Visual Speech Recognition system are applied to: (a) segment the mouth region of interest, (b) extract the visual features from the real time input video image and (c) to identify the visual speech units. The major difficulty associated with the VSR systems resides in the identification of the smallest elements contained in the image sequences that represent the lip movements in the visual domain. The Visual Speech Unit concept as proposed represents an extension of the standard viseme model that is currently applied for VSR. The VSU model augments the standard viseme approach by including in this new representation not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG- 4) viseme models. Two experimental results indicate that: 1. The developed VSR system achieved 80-90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 62-72%. 2. 15 words are identified when VSU and viseme are employed as the visual speech element. The accuracy rate for word recognition based on VSUs is 7%-12% higher than the accuracy rate based on visemes

    Auditory-Visual Integration during the Perception of Spoken Arabic

    Get PDF
    This thesis aimed to investigate the effect of visual speech cues on auditory-visual integration during speech perception in Arabic. Four experiments were conducted two of which were cross linguistic studies using Arabic and English listeners. To compare the influence of visual speech in Arabic and English listeners chapter 3 investigated the use of visual components of auditory-visual stimuli in native versus non-native speech using the McGurk effect. The experiment suggested that Arabic listeners’ speech perception was influenced by visual components of speech to a lesser degree compared to English listeners. Furthermore, auditory and visual assimilation was observed for non-native speech cues. Additionally when the visual cue was an emphatic phoneme the Arabic listeners incorporated the emphatic visual cue in their McGurk response. Chapter 4, investigated whether the lower McGurk effect response in Arabic listeners found in chapter 3 was due to a bottom-up mechanism of visual processing speed. Chapter 4, using auditory-visual temporal asynchronous conditions, concluded that the differences in McGurk response percentage was not due to bottom-up mechanism of visual processing speed. This led to the question of whether the difference in auditory-visual integration of speech could be due to more ambiguous visual cues in Arabic compared to English. To explore this question it was first necessary to identify visemes in Arabic. Chapter 5 identified 13 viseme categories in Arabic, some emphatic visemes were visually distinct from their non-emphatic counterparts and a greater number of phonemes within the guttural viseme category were found compared to English. Chapter 6 evaluated the visual speech influence across the 13 viseme categories in Arabic measured by the McGurk effect. It was concluded that the predictive power of visual cues and the contrast between visual and auditory speech components will lead to an increase in the McGurk response percentage in Arabic

    Psychological Engagement in Choice and Judgment Under Risk and Uncertainty

    Get PDF
    Theories of choice and judgment assume that agents behave rationally, choose the higher expected value option, and evaluate the choice consistently (Expected Utility Theory, Von Neumann, & Morgenstern, 1947). However, researchers in decision-making showed that human behaviour is different in choice and judgement tasks (Slovic & Lichtenstein, 1968; 1971; 1973). In this research, we propose that psychological engagement and control deprivation predict behavioural inconsistencies and utilitarian performance with judgment and choice. Moreover, we explore the influences of engagement and control deprivation on agent’s behaviours, while manipulating content of utility (Kusev et al., 2011, Hertwig & Gigerenzer 1999, Tversky & Khaneman, 1996) and decision reward (Kusev et al, 2013, Shafir et al., 2002)
    corecore