Search CORE

426 research outputs found

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Listening in a second language: a pupillometric investigation of the effect of semantic and acoustic cues on listening effort

Author: Borghini Giulia
Publication venue: UCL (University College London)
Publication date: 28/11/2019
Field of study

Non-native listeners live a great part of their day immersed in a second language environment. Challenges arise because many linguistic interactions happen in noisy environments, and because their linguistic knowledge is imperfect. Pupillometry was shown to provide a reliable measure of cognitive effort during listening. This research aims to investigate by means of pupillometry how listening effort is modulated by the intelligibility level of the listening task, the availability of contextual and acoustic cues and by the language background of listeners (native vs non-native). In Study 1, listening effort in native and non-native listeners was evaluated during a sentence perception task in noise across different intelligibility levels. Results indicated that listening effort was increased for non-native compared to native listeners, when the intelligibility levels were equated across the two groups. In Study 2, using a similar method, materials included predictable and semantically anomalous sentences, presented in a plain and a clear speaking style. Results confirmed an increased listening effort for non-native compared to native listeners. Listening effort was overall reduced when participants attended to clear speech. Moreover, effort reduction after the sentence ended was delayed for less proficient non-native listeners. In Study 3, the contribution of semantic content spanning over several sentences was evaluated using lists of semantically related and unrelated stimuli. The presence of semantic cues across sentences led to a reduction in listening effort for native listeners as reflected by the peak pupil dilation, while non-native listeners did not show the same benefit. In summary, this research consistently showed an increased listening effort for non-native compared to native listeners, at equated levels of intelligibility. Additionally, the use of a clear speaking style proved to be an effective strategy to enhance comprehension and to reduce cognitive effort in native and non-native listeners

UCL Discovery

Models and analysis of vocal emissions for biomedical applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

Directory of Open Access Books (DOAB)

A novel lip geometry approach for audio-visual speech recognition

Author: Zamri Ibrahim (7201733)
Publication venue
Publication date: 01/01/2014
Field of study

By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination of a skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching technique able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset

Loughborough University Institutional Repository

UMP Institutional Repository

Proceedings: Voice Technology for Interactive Real-Time Command/Control Systems Application

Author: Breaux Robert
Curran P. Mike
Huff Edward M.
Publication venue
Publication date
Field of study

Speech understanding among researchers and managers, current developments in voice technology, and an exchange of information concerning government voice technology efforts are discussed

NASA Technical Reports Server

Advances in Robotics, Automation and Control

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The book presents an excellent overview of the recent developments in the different areas of Robotics, Automation and Control. Through its 24 chapters, this book presents topics related to control and robot design; it also introduces new mathematical tools and techniques devoted to improve the system modeling and control. An important point is the use of rational agents and heuristic techniques to cope with the computational complexity required for controlling complex systems. Through this book, we also find navigation and vision algorithms, automatic handwritten comprehension and speech recognition systems that will be included in the next generation of productive systems developed by man

Directory of Open Access Books (DOAB)