50 research outputs found

    Robust visual speech recognition using optical flow analysis and rotation invariant features

    Get PDF
    The focus of this thesis is to develop computer vision algorithms for visual speech recognition system to identify the visemes. The majority of existing speech recognition systems is based on audio-visual signals and has been developed for speech enhancement and is prone to acoustic noise. Considering this problem, aim of this research is to investigate and develop a visual only speech recognition system which should be suitable for noisy environments. Potential applications of such a system include the lip-reading mobile phones, human computer interface (HCI) for mobility-impaired users, robotics, surveillance, improvement of speech based computer control in a noisy environment and for the rehabilitation of the persons who have undergone a laryngectomy surgery. In the literature, there are several models and algorithms available for visual feature extraction. These features are extracted from static mouth images and characterized as appearance and shape based features. However, these methods rarely incorporate the time dependent information of mouth dynamics. This dissertation presents two optical flow based approaches of visual feature extraction, which capture the mouth motions in an image sequence. The motivation for using motion features is, because the human perception of lip-reading is concerned with the temporal dynamics of mouth motion. The first approach is based on extraction of features from the optical flow vertical component. The optical flow vertical component is decomposed into multiple non-overlapping fixed scale blocks and statistical features of each block are computed for successive video frames of an utterance. To overcome the issue of large variation in speed of speech, each utterance is normalized using simple linear interpolation method. In the second approach, four directional motion templates based on optical flow are developed, each representing the consolidated motion information in an utterance in four directions (i.e.,up, down, left and right). This approach is an evolution of a view based approach known as motion history image (MHI). One of the main issues with the MHI method is its motion overwriting problem because of self-occlusion. DMHIs seem to solve this issue of overwriting. Two types of image descriptors, Zernike moments and Hu moments are used to represent each image of DMHIs. A support vector machine (SVM) classifier was used to classify the features obtained from the optical flow vertical component, Zernike and Hu moments separately. For identification of visemes, a multiclass SVM approach was employed. A video speech corpus of seven subjects was used for evaluating the efficiency of the proposed methods for lip-reading. The experimental results demonstrate the promising performance of the optical flow based mouth movement representations. Performance comparison between DMHI and MHI based on Zernike moments, shows that the DMHI technique outperforms the MHI technique. A video based adhoc temporal segmentation method is proposed in the thesis for isolated utterances. It has been used to detect the start and the end frame of an utterance from an image sequence. The technique is based on a pair-wise pixel comparison method. The efficiency of the proposed technique was tested on the available data set with short pauses between each utterance

    On the design of visual feedback for the rehabilitation of hearing-impaired speech

    Get PDF

    Broadcast speech and the effect of voice quality on the listener : a study of the various components which categorise listener perception by vocal characteristics.

    Get PDF
    Voice quality is crucial to the art of the broadcast speaker. Acceptable voice quality is a necessity for an acceptable microphone voice and essential therefore for employment as a broadcaster. This thesis investigates the characteristics of the voice which provide that acceptability; and categorises the features which lead the listener to make judgements about their vocal likes and dislikes. These subjective judgements are explored by investigating the psychological, medical, and innate features contributing to the vocal perceptions of the listener. Voice quality is related to the efficiency of the larynx and its importance to voice production; and to the various vocal disorders which can affect the broadcaster. It becomes evident throughout the thesis that each listener receives a clear impression of the personality of the speaker through the features present in the voice. Many of these impressions however are based on stereotypes. The thesis relates these stereotypical judgements to accents, investigating their relationship to the 'BBC' voice, the 'World Service' voice, the 'ILR' voice and the 'reporter's voice' . It is shown that the listener's subjective impression of the voice and the broadcaster personality is formed by the presentational and physical aspects of voice quality. Listener perceptions of voice acceptability are tested and discussed. The data is analysed to provide a set of dominant characteristics from which are drawn voice histograms and frequency polygons. The result is a set of preferred voice characteristics which apply specifically to the broadcast speaker and which can be sought during the selection process

    Blind Speech Extraction for Non-Audible Murmur Speech with Speaker's Movement Noise

    Get PDF
    ISSPIT 2012: The 12th IEEE International Symposium on Signal Processing and Information Technology, December 12-15, 2012, Ho Chi Minh City, Vietnam.In this paper, we address an improved method of noise reduction used in multichannel Non-Audible Murmur (NAM) based on blind source separation. Recently, speech processing with NAM has been proposed for applying versatile speech interface into quiet environments where we hesitate to utter. NAM is a very soft whispered voice signal detected with the NAM microphone, which is one of the body-conductive microphone. The detected NAM signal always suffers from nonstationary noise caused by speaker's movement because it changes the setting condition of the NAM microphone. In order to reduce the noise signal, blind noise reduction using stereo NAM signals detected with two NAM microphones has been proposed by some of the authors. In this paper, we aim to achieve further improvement in the noise reduction ability by changing the noise estimation and postprocessing algorithms to enhance the target NAM signal. In addition, we evaluate the application of recording the NAM signals with various types of microphones

    Predicting room acoustical behavior with the ODEON computer model

    Get PDF

    The Noise of the Oppressed

    Get PDF

    Regional Variation in Panjabi-English

    Get PDF
    The research presented in this thesis details the linguistic patterns of two contact varieties of English spoken in the UK. Based on an analysis of recordings made in two British cities, the research assesses the influence of Panjabi on the English spoken in Bradford and Leicester. In addition, it considers what the role and influence of the respective regional ‘Anglo English’ variety is having on the development of the contact variety in each location. The research here focusses on variation in voice quality, the vowels FACE, GOAT and GOOSE, and the realisation of /r/. For voice quality, a vocal profile analysis (e.g. Laver 1980) was completed for each of the speakers included in the corpus with characteristic vocal settings observed among Panjabi and Anglo English groups. The results from a dynamic vowel analysis of F1 and F2 variation across the trajectory for FACE, GOAT and GOOSE illustrated that despite the cross regional similarities which are observable in Panjabi English, local interpretations are crucial. A combined auditory and acoustic analysis of /r/ in word initial and medial position revealed divergent regional patterns in Panjabi English. A number of arguments are put forward to account for the linguistic parallels reported here, and more widely, in contact varieties of English in the UK. The findings of the thesis contribute to a growing body of work that explores the development of contact varieties spoken in the UK, with this thesis concentrating on the development of ‘Panjabi English’ in two locations simultaneously. The patterns observed are accounted for by considering research from both language and dialect contact research, with the thesis drawing together ideas from these two separate fields. The claims put forward argue that the similar patterns observed can be considered to be independent innovations, with contact processes accounting for the linguistic correspondences
    corecore