Search CORE

38 research outputs found

Effect of Visual Input on Vowel Production in English Speakers

Author: Richardson Amanda C.
Publication venue: DigitalCommons@Macalester College
Publication date: 05/05/2010
Field of study

This study analyzes whether there should be a visual component to a model of speech perception and production by comparing the jaw opening, advancement, and rounding of American English and non-English vowels in the presence and absence of a visual stimulus. Surprisingly, jaw opening did not change production, but the presence of the visual stimulus was found to be a significant factor in participants’ vowel advancement for non-English vowels. This may be explained by lip rounding, but requires further research in order to develop a full understanding of the impact of visual input on vowel production to be used in teaching and learning languages

DigitalCommons@Macalester College

Acoustics and Perception of Clear Fricatives

Author: Maniwa Kazumi
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 27/08/2019
Field of study

Everyday observation indicates that speakers can naturally and spontaneously adopt a speaking style that allows them to be understood more easily when confronted with difficult communicative situations. Previous studies have demonstrated that the resulting speaking style, known as clear speech, is more intelligible than casual, conversational speech for a variety of listener populations. However, few studies have examined the acoustic properties of clearly produced fricatives in detail. In addition, it is unknown whether clear speech improves the intelligibility of fricative consonants, or how its effects on fricative perception might differ depending on listener population. Since fricatives are the cause of a large number of recognition errors both for normal-hearing listeners in adverse conditions and for hearing-impaired listeners, it is of interest to explore these issues in detail focusing on fricatives. The current study attempts to characterize the type and magnitude of adaptations in the clear production of English fricatives and determine whether clear speech enhances fricative intelligibility for normal-hearing listeners and listeners with simulated impairment. In an acoustic experiment (Experiment I), ten female and ten male talkers produced nonsense syllables containing the fricatives /f, &thetas;, s, [special characters omitted], v, δ, z, and [y]/ in VCV contexts, in both a conversational style and a clear style that was elicited by means of simulated recognition errors in feedback received from an interactive computer program. Acoustic measurements were taken for spectral, amplitudinal, and temporal properties known to influence fricative recognition. Results illustrate that (1) there were consistent overall clear speech effects, several of which (consonant duration, spectral peak location, spectral moments) were consistent with previous findings and a few (notably consonant-to-vowel intensity ratio) which were not, (2) 'contrastive' differences related to acoustic inventory and eliciting prompts were observed in key comparisons, and (3) talkers differed widely in the types and magnitude of acoustic modifications. Two perception experiments using these same productions as stimuli (Experiments II and III) were conducted to address three major questions: (1) whether clearly produced fricatives are more intelligible than conversational fricatives, (2) what specific acoustic modifications are related to clear speech intelligibility advantages, and (3) how sloping, recruiting hearing impairment interacts with clear speech strategies. Both perception experiments used an adaptive procedure to estimate the signal to (multi-talker babble) noise ratio (SNR) threshold at which minimal pair fricative categorizations could be made with 75% accuracy. Data from fourteen normal-hearing listeners (Experiment II) and fourteen listeners with simulated sloping elevated thresholds and loudness recruitment (Experiment III) indicate that clear fricatives were more intelligible overall for both listener groups. However, for listeners with simulated hearing impairment, a reliable clear speech intelligibility advantage was not found for non-sibilant pairs. Correlation analyses comparing acoustic and perceptual style-related differences across the 20 speakers encountered in the experiments indicated that a shift of energy concentration toward higher frequency regions and greater source strength was a primary contributor to the "clear fricative effect" for normal-hearing listeners but not for listeners with simulated loss, for whom information in higher frequency regions was less audible

KU ScholarWorks

Automatic User-Adaptive Speaking Rate Selection

Author: Nigel Ward
Satoshi Nakagawa
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Automation of the Spoken Poetry Rhyming Game in Persian

Author: Hadi Veisi
Mahmood bijankhan
Publication venue: Regional Information Center for Science and Technology (RICeST)
Publication date: 01/01/2023
Field of study

This paper aims to investigate how a Persian spoken poetry game, called Mosha'ere, can be computerized by using a Persian automatic speech recognition system trained with read speech. To do this, the text and recitation speech of the poetries of the great poets, Hafez and Sa'di, were gathered. A spoken poetry rhyming game called Chakame, was developed. It utilizes a context-dependent tri-phone HMM acoustic modeling trained by Persian read speech with normal speed to recognize beyts, i.e., lines of verses, spoken by a human user. Chakame was evaluated against two kinds of recitation speech: 100 beyts recited formally at the normal rate and another 100 beyts recited emotionally hyperarticulated at a slow rate. About 23% difference in WER shows the impact of the intrinsic features of emotional recitation speech of verses on recognition rate. However, an overall beyt recognition rate of 98.5% was obtained for Chekame

Directory of Open Access Journals

Recommended from our members

The role of vowel hyperarticulation in clear speech to foreigners and infants

Author: Kangatharan Jayanthiny
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonResearch on clear speech has shown that the type of clear speech produced can vary depending on the speaker, the listener and the medium. Although prior research has suggested that clear speech is more intelligible than conversational speech for normal-hearing listeners in noisy environments, it is not known which acoustic features of clear speech are the most responsible for enhanced intelligibility and comprehension. This thesis focused on investigating the acoustic characteristics that are produced in clear speech to foreigners and infants. Its aim was to assess the utility of these features in enhancing speech intelligibility and comprehension. The results of Experiment 1 showed that native speakers produced exaggerated vowel space in natural interactions with foreign-accented listeners compared to native-accented listeners. Results of Experiment 2 indicated that native speakers exaggerated vowel space and pitch to infants compared to clear read speech. Experiments 3 and 4 focused on speech perception and used transcription and clarity rating tasks. Experiment 3 contained speech directed at foreigners and showed that speech to foreign-accented speakers was rated clearer than speech to native-accented speakers. Experiment 4 contained speech directed at infants and showed that native speakers rated infant-directed speech as clearer than clear read speech. In the fifth and final experiment, naturally elicited clear speech towards foreign-accented interlocutors was used in speech comprehension tasks for native and non-native listeners with varying proficiency of English. It was revealed that speech with expanded vowel space improved listeners’ comprehension of speech in quiet and noise conditions. Results are discussed in terms of the Lindblom’s (1990) theory of Hyper and Hypoarticulation, an influential framework of speech production and perception.Brunel University Isambard Research Scholarshi

Brunel University Research Archive

SPA: Web-based platform for easy access to speech processing modules

Author: Abad A.
Batista F.
Curto P.
Ferreira J.
Matos D. M. de.
Moniz H.
Ribeiro E.
Ribeiro R.
Trancoso I.
Publication venue: ELDA/ELRA
Publication date: 01/01/2016
Field of study

This paper presents SPA, a web-based Speech Analytics platform that integrates several speech processing modules and that makes it possible to use them through the web. It was developed with the aim of facilitating the usage of the modules, without the need to know about software dependencies and specific configurations. Apart from being accessed by a web-browser, the platform also provides a REST API for easy integration with other applications. The platform is flexible, scalable, provides authentication for access restrictions, and was developed taking into consideration the time and effort of providing new services. The platform is still being improved, but it already integrates a considerable number of audio and text processing modules, including: Automatic transcription, speech disfluency classification, emotion detection, dialog act recognition, age and gender classification, non-nativeness detection, hyperarticulation detection, dialog act recognition, and two external modules for feature extraction and DTMF detection. This paper describes the SPA architecture, presents the already integrated modules, and provides a detailed description for the ones most recently integrated.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL

Efficient error correction for speech systems using constrained re-recognition

Author: Yu Gregory T
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 71-75).Efficient error correction of recognition output is a major barrier in the adoption of speech interfaces. This thesis addresses this problem through a novel correction framework and user interface. The system uses constraints provided by the user to enhance re-recognition, correcting errors with minimal user effort and time. In our web interface, users listen to the recognized utterance, marking incorrect words as they hear them. After they have finished marking errors, they submit the edits back to the speech recognizer where it is merged with previous edits and then converted into a finite state transducer. This FST, modeling the regions of correct and incorrect words in the recognition output, is then composed with the recognizer's language model and the utterance is re-recognized. We explored the use of our error correction technique in both the lecture and restaurant domain, evaluating the types of errors and the correction performance in each domain. With our system, we have found significant improvements over other error correction techniques such as n-best lists, re-speaking or verbal corrections, and retyping in terms of actions per correction step, corrected output rate, and ease of use.by Gregory T. Yu.M.Eng

DSpace@MIT

The Science and Art of Voice Interfaces

Author: Krahmer E.J.
Publication venue: Philips
Publication date: 01/01/2001
Field of study

Tilburg University Repository

Compensating hyperarticulation for automatic speech recognition

Author: Soltau Hagen
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Statistical distributions of consonant variants in infant-directed speech: evidence that /t/ may be exceptional

Author: Bergeson Tonya R.
Dilley Laura
Gamache Jessica
Houston Derek M.
Wang Yuanyuan
Publication venue: Digital Commons @ Butler University
Publication date: 24/05/2019
Field of study

Statistical distributions of phonetic variants in spoken language influence speech perception for both language learners and mature users. We theorized that patterns of phonetic variant processing of consonants demonstrated by adults might stem in part from patterns of early exposure to statistics of phonetic variants in infant-directed (ID) speech. In particular, we hypothesized that ID speech might involve greater proportions of canonical /t/ pronunciations compared to adult-directed (AD) speech in at least some phonological contexts. This possibility was tested using a corpus of spontaneous speech of mothers speaking to other adults, or to their typically-developing infant. Tokens of word-final alveolar stops – including /t/, /d/, and the nasal stop /n/ – were examined in assimilable contexts (i.e., those followed by a word-initial labial and/or velar); these were classified as canonical, assimilated, deleted, or glottalized. Results confirmed that there were significantly more canonical pronunciations in assimilable contexts in ID compared with AD speech, an effect which was driven by the phoneme /t/. These findings suggest that at least in phonological contexts involving possible assimilation, children are exposed to more canonical /t/ variant pronunciations than adults are. This raises the possibility that perceptual processing of canonical /t/ may be partly attributable to exposure to canonical /t/ variants in ID speech. Results support the need for further research into how statistics of variant pronunciations in early language input may shape speech processing across the lifespan

IUPUIScholarWorks

Digital Commons @ Butler University