Search CORE

1,121 research outputs found

Jaw Rotation in Dysarthria Measured With a Single Electromagnetic Articulography Sensor

Author: Berry Jeffrey
Johnson Michael T.
Kolb Andrew
Schroeder James
Publication venue: e-Publications@Marquette
Publication date: 01/06/2017
Field of study

Purpose This study evaluated a novel method for characterizing jaw rotation using orientation data from a single electromagnetic articulography sensor. This method was optimized for clinical application, and a preliminary examination of clinical feasibility and value was undertaken. Method The computational adequacy of the single-sensor orientation method was evaluated through comparisons of jaw-rotation histories calculated from dual-sensor positional data for 16 typical talkers. The clinical feasibility and potential value of single-sensor jaw rotation were assessed through comparisons of 7 talkers with dysarthria and 19 typical talkers in connected speech. Results The single-sensor orientation method allowed faster and safer participant preparation, required lower data-acquisition costs, and generated less high-frequency artifact than the dual-sensor positional approach. All talkers with dysarthria, regardless of severity, demonstrated jaw-rotation histories with more numerous changes in movement direction and reduced smoothness compared with typical talkers. Conclusions Results suggest that the single-sensor orientation method for calculating jaw rotation during speech is clinically feasible. Given the preliminary nature of this study and the small participant pool, the clinical value of such measures remains an open question. Further work must address the potential confound of reduced speaking rate on movement smoothness

epublications@Marquette

Plain-to-clear speech video conversion for enhanced intelligibility

Author: Behne Dawn M.
Hamarneh Ghassan
Jongman Allard
Ruan Haoyao
Sachdeva Shubam
Sereno Joan A.
Wang Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/01/2023
Field of study

Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies

KU ScholarWorks

Recognizing prosody from the lips: is it possible to extract prosodic focus from lip features?

Author: Dohen Marion
Hill Harold
Loevenbruck Hélène
Publication venue: Medical Information Science Reference, Hershey, New York
Publication date: 01/01/2009
Field of study

International audienceThe aim of this chapter is to examine the possibility of extracting prosodic information from lip features. We used two measurement techniques enabling automatic lip feature extraction to evaluate the "lip pattern" of prosodic focus in French. Two corpora with Subject-Verb-Object (SVO) sentences were designed. Four focus conditions (S, V, O or neutral) were elicited in a natural dialogue situation. In a first set of experiments, we recorded two speakers of French with front and profile video cameras. The speakers wore blue make-up and facial markers. In a second set we recorded five speakers with a 3D optical tracker. An analysis of the lip features showed that visible articulatory lip correlates of focus exist for all speakers. Two types of patterns were observed: absolute and differential. A potential outcome of this study is to provide criteria for automatic visual detection of prosodic focus from lip data

Crossref

Hal - Université Grenoble Alpes

Research Online

The Application of Clear Speech in Electrolaryngeal Speakers

Author: Cox Steven R
Publication venue: Scholarship@Western
Publication date: 10/03/2016
Field of study

The present work was comprised of a series of experiments that investigated the application of clear speech (CS) in a group of electrolaryngeal (EL) speakers. Three experiments were conducted to assess the impact of CS on three important aspects of EL speech. More specifically, Experiment 1 sought to identify the impact of CS on EL speakers’ word and consonant intelligibility; Experiment 2 examined the influence of CS on the acoustic characteristics of words and vowels in EL speech; and finally, Experiment 3 sought to identify the influence of CS produced by EL speakers on auditory-perceptual ratings by naïve listeners. Results revealed that overall word and consonant intelligibility were minimally different when EL speakers used CS compared to their everyday, ‘habitual’ speech (HS) (Experiment 1). Secondly, EL speakers’ use of CS significantly increased word durations, but did not have a substantial impact on fundamental and formant frequency characteristics of vowels (Experiment 2). Finally, due to the productive changes associated with CS involving a slower rate of speech, over-articulation, and increased mouth-opening, listeners judged EL speech to be significantly less acceptable to listen to when compared to HS. However, no significant effect of speaking condition was noted on listeners’ comfort levels (Experiment 3). Overall, findings suggest that the acoustic deficits in EL speech might be too complex to derive further benefit from CS in the areas of speech intelligibility, the acoustic structure of EL speech and/or auditory-perceptual ratings of EL speakers. Clinical implications and future directions for research are discussed

Scholarship@Western

Reduktion in natürlicher Sprache

Author: Zimmerer Frank
Publication venue
Publication date: 12/01/2010
Field of study

Natural (conversational) speech, compared to cannonical speech, is earmarked by the tremendous amount of variation that often leads to a massive change in pronunciation. Despite many attempts to explain and theorize the variability in conversational speech, its unique characteristics have not played a significant role in linguistic modeling. One of the reasons for variation in natural speech lies in a tendency of speakers to reduce speech, which may drastically alter the phonetic shape of words. Despite the massive loss of information due to reduction, listeners are often able to understand conversational speech even in the presence of background noise. This dissertation investigates two reduction processes, namely regressive place assimilation across word boundaries, and massive reduction and provides novel data from the analyses of speech corpora combined with experimental results from perception studies to reach a better understanding of how humans handle natural speech. The successes and failures of two models dealing with data from natural speech are presented: The FUL-model (Featurally Underspecified Lexicon, Lahiri & Reetz, 2002), and X-MOD (an episodic model, Johnson, 1997). Based on different assumptions, both models make different predictions for the two types of reduction processes under investigation. This dissertation explores the nature and dynamics of these processes in speech production and discusses its consequences for speech perception. More specifically, data from analyses of running speech are presented investigating the amount of reduction that occurs in naturally spoken German. Concerning production, the corpus analysis of regressive place assimilation reveals that it is not an obligatory process. At the same time, there emerges a clear asymmetry: With only very few exceptions, only [coronal] segments undergo assimilation, [labial] and [dorsal] segments usually do not. Furthermore, there seem to be cases of complete neutralization where the underlying Place of Articulation feature has undergone complete assimilation to the Place of Articulation feature of the upcoming segment. Phonetic analyses further underpin these findings. Concerning deletions and massive reductions, the results clearly indicate that phonological rules in the classical generative tradition are not able to explain the reduction patterns attested in conversational speech. Overall, the analyses of deletion and massive reduction in natural speech did not exhibit clear-cut patterns. For a more in-depth examination of reduction factors, the case of final /t/ deletion is examined by means of a new corpus constructed for this purpose. The analysis of this corpus indicates that although phonological context plays an important role on the deletion of segments (i.e. /t/), this arises in the form of tendencies, not absolute conditions. This is true for other deletion processes, too. Concerning speech perception, a crucial part for both models under investigation (X-MOD and FUL) is how listeners handle reduced speech. Five experiments investigate the way reduced speech is perceived by human listeners. Results from two experiments show that regressive place assimilations can be treated as instances of complete neutralizations by German listeners. Concerning massively reduced words, the outcome of transcription and priming experiments suggest that such words are not acceptable candidates of the intended lexical items for listeners in the absence of their proper phrasal context. Overall, the abstractionist FUL-model is found to be superior in explaining the data. While at first sight, X-MOD deals with the production data more readily, FUL provides a better fit for the perception results. Another important finding concerns the role of phonology and phonetics in general. The results presented in this dissertation make a strong case for models, such as FUL, where phonology and phonetics operate at different levels of the mental lexicon, rather than being integrated into one. The findings suggest that phonetic variation is not part of the representation in the mental lexicon.Natürliche (spontane) Sprache in Dialogen zeichnet sich, im Vergleich zu kanonischer Sprache, vor allem durch das enorme Ausmaß an Variation aus. Diese kann oft dazu führen, dass Wörter in der Aussprache massiv verändert werden. Trotz einiger Bemühungen, Variabilität in natürlicher Sprache zu erklären und theoretisch zu fassen, haben die einzigartigen Merkmale natürlicher Sprache kaum Eingang in linguistische Modelle gefunden. Einer der Gründe, warum Variation in natürlicher Sprache zu beobachten ist, liegt in der Tendenz der Sprecher, Sprache zu reduzieren. Dies kann die phonetische Gestalt von Wörtern drastisch beeinflussen. Obwohl hierdurch massiv Information durch Reduktion verloren geht, sind Hörer oft in der Lage Spontansprache zu verstehen, sogar, wenn Hintergrundgeräusche dies erschweren. Diese Dissertation untersucht zwei Reduktionsprozesse: Regressive Assimilation des Artikulationsortes über Wortgrenzen hinweg und Massive Reduktion. Es werden neue Daten präsentiert, die durch die Analysen von Sprachkorpora gewonnen wurden. Außerdem stehen experimentelle Ergebnisse von Perzeptionsstudien im Mittelpunkt, die helfen sollen, besser zu verstehen, wie Menschen mit natürlicher Sprache umgehen. Die Dissertation zeigt die Erfolge und Probleme von zwei Modellen im Umgang mit Daten von natürlicher Sprache auf: Das FUL-Modell (Featurally Underspecified Lexicon , Lahiri & Reetz, 2002), und X-MOD (ein episodisches Modell, Johnson, 1997). Aufgrund unterschiedlicher Annahmen machen die zwei Modelle verschiedene Vorhersagen für die beiden Reduktionsprozesse, die in dieser Dissertation untersucht werden. Es werden Art und Auswirkungen der beiden Prozesse für Sprachproduktion untersucht und die Konsequenzen für das Sprachverstehen beleuchtet. Was die Sprachproduktion betrifft, so zeigt eine Korpusanalyse von natürlich gesprochenem Deutsch, dass der Reduktionsprozess regressive Assimilation des Artikulationsortes nicht obligatorisch statt findet. Gleichzeitig wird eine hervorstechende Asymmetrie deutlich: Abgesehen von einigen wenigen Ausnahmen werden ausschließlich [koronale] Segmente assimiliert, [labiale] und [dorsale] Segmente normalerweise nicht. Außerdem, so legen die Produktionsdaten nahe, gibt es Fälle, in denen die Assimilation des Artikulationsortes an den Artikulationsort des Folgesegmentes komplett ist, also eine vollständige Neutralisierung der Merkmalskontraste vom Sprecher vorgenommen wurde. Phonetische Analysen bestätigen dieses Resultat. Im Fall von Löschungen und massiven Reduktion demonstrieren die Ergebnisse eindeutig, dass phonologische Regeln – im klassischen generativen Sinne – nicht in der Lage sind, die Reduktionsmuster zu beschreiben, die in Spontansprache vorkommen. Alles in allem zeigen die Analysen von massiven Reduktionen und Löschungen keine eindeutigen Muster auf. Um einzelne Faktoren, die Reduktionen beeinflussen, genauer untersuchen zu können, wurde die Löschung von (Wort) finalem /t/ anhand eines neuen, für diesen Zweck kreierten Korpus durchgeführt. Die Analyse dieses Korpus unterstreicht, dass, obwohl phonologischer Kontext eine gewichtigen Einfluss darauf hat, ob Segmente (d.h. /t/) gelöscht werden, dieser Einfluss eher als Tendenz verstanden werden muss, nicht als absolute Bedingung. Dieses Resultat trifft auch auf andere Löschungsprozesse zu. Beide Modelle (X-MOD und FUL), die in dieser Dissertation untersucht werden, gehen im Kern der Frage nach, wie Hörer Sprache verstehen. Fünf Experimente untersuchen, wie reduzierte Sprache von menschlichen Hörern wahrgenommen wird. Ergebnisse von zwei Studien zeigen, dass Assimilationen von deutschen Hörern durchaus als komplett neutralisiert wahrgenommen werden. Was die Perzeption von massiv reduzierten Wörtern betrifft, belegen die Resultate von Transkriptionsstudien und Priming-Experimenten, dass solche Wörter nicht als Wortkandidaten für die korrekten lexikalischen Einträge akzeptiert werden, wenn sie ohne ihren Satz-Kontext dargeboten werden. Insgesamt ist das abstraktionistische FUL-Modell besser in der Lage, die Daten zu erklären, die in dieser Dissertation präsentiert werden. Auf den ersten Blick scheint X-MOD zwar etwas besser geeignet, die Produktionsdaten zu erklären, hauptsächlich jedoch, weil Variation als Grundannahme im Modell verankert ist. FUL ist klar überlegen, was die Perzeptionsseite betrifft. Ein weiteres wichtiges Ergebnis dieser Dissertation ist die Rolle, die Phonologie und Phonetik im Allgemeinen zugedacht werden kann. Die Resultate, die hier vorgestellt werden, liefern starke Argumente für Modelle – wie z.B. FUL – in denen Phonologie und Phonetik auf verschiedenen Ebenen des mentalen Lexikons aktiv sind und nicht in einem integriert sind. Die Befunde legen nahe, dass phonetische Variation nicht Teil der Repräsentation im mentalen Lexikon ist

Hochschulschriftenserver - Universität Frankfurt am Main

Acoustic Characteristics of Tense and Lax Vowels Across Sentence Position in Clear Speech

Author: Roesler Lindsay Kayne
Publication venue: UWM Digital Commons
Publication date: 01/08/2013
Field of study

The purpose of this study was to examine the acoustic characteristics of tense and lax vowels across sentence positions in clear speech. Recordings were made of 12 participants reading monosyllabic target words at varying positions within semantically meaningful sentences. Acoustic analysis was completed to determine the effects of Style (clear vs. conversational), Tenseness (tense vs. lax), and Position (sentence-medial vs. sentence-final) on vowel duration, vowel space area, vowel space dispersion, and vowel peripheralization. The results showed speakers had longer durations and expanded vowel spaces in clear speech for both tense and lax vowels. Importantly, the amount of increase was similar for tense and lax vowels suggesting the defining properties of lax vowels (i.e., short duration and centralization) were manipulated in clear speech. A significant main effect of position for lax vowel space expansion showed greater vowel spaces for lax vowels in sentence-medial position in clear speech. Clear speech vowel adaptations appear to be dynamic with both vowel-specific and general transformations

University of Wisconsin-Milwaukee

The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review

Author: Kalashnikova Marina
Martin Clara D.
Piazza Giorgio
Publication venue: 'American Speech Language Hearing Association'
Publication date: 01/01/2022
Field of study

Published online: Aug 1, 2022Purpose: This scoping review considers the acoustic features of a clear speech register directed to nonnative listeners known as foreigner-directed speech (FDS). We identify vowel hyperarticulation and low speech rate as the most representative acoustic features of FDS; other features, including wide pitch range and high intensity, are still under debate. We also discuss factors that may influence the outcomes and characteristics of FDS. We start by examining accommodation theories, outlining the reasons why FDS is likely to serve a didactic function by helping listeners acquire a second language (L2). We examine how this speech register adapts to listeners’ identities and linguistic needs, suggesting that FDS also takes listeners’ L2 proficiency into account. To confirm the didactic function of FDS, we compare it to other clear speech registers, specifically infant-directed speech and Lombard speech. Conclusions: Our review reveals that research has not yet established whether FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex set of factors determines specific realizations of FDS, which need further exploration. We conclude by summarizing open questions and indicating directions and recommendations for future research.This research was supported by a Doctoral Fellowship (LCF/BQ/DI19/11730045) from “La Caixa” Foundation (ID 100010434) awarded to Giorgio Piazza and by the Spanish Ministry of Science and Innovation through the Ramon y Cajal Research Fellowship (RYC2018-024284-I) awarded to Marina Kalashnikova. This research was supported by the Basque Government through the BERC 2022-2025 program and by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation CEX2020-001010-S. This research was also supported by the Spanish Ministry of Economy and Competitiveness (PID2020-113926GB-I00 awarded to Clara D. Martin) and by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement 819093 awarded to Clara D. Martin)

Archivo Digital para la Docencia y la Investigación

The early phase of /ɹ/ production development in adult Japanese learners of English

Author: Munro M.J.
Saito Kazuya
Publication venue: 'SAGE Publications'
Publication date: 03/11/2014
Field of study

Although previous research indicates that Japanese speakers’ second-language (L2) perception and production of English /ɹ/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /ɹ/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /ɹ/ from 60 Japanese learners with a length of residence (LOR) between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 low-proficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditions—word reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is task-dependent and is influenced by the availability of L1 phonetic cues for redeployment in L2

Birkbeck Institutional Research Online

Recommended from our members

Expansion of prosodic abilities at the transition from babble to words: a comparison between children with cochlear implants and normally hearing children

Author: Bates
Beckman
Carter
Chapman
Curtin
Daelemans
De Clerck
DePaolis
Dinnsen
Faes
Flipsen
Friederici
Gerken
Goffman
Green
Heisler
Holt
Hopyan-Misakyan
Houston
Jusczyk
Knudsen
Kochanski
Koopmans-van Beinum
Lee
Lenden
Lieberman
Maye
McKean
Meister
Molemans
Montag
Moore
Most
Nakata
Payne
Peng
Peters
Pierrehumbert
Redford
Schauwers
Segal
Sharma
Snow
Stoel-Gammon
Titterington
Torppa
Vanormelingen
Verhoeven
Vihman
Vihman
Werker
White
Yeung
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/01/2017
Field of study

Objectives: This longitudinal study examined the impact of emerging vocabulary production on the ability to produce the phonetic cues to prosodic prominence in babbled and lexical disyllables of infants with Cochlear Implants (CI) and normally hearing infants (NH). Current research on typical language acquisition emphasizes the importance of vocabulary development for phonological and phonetic acquisition. Children with cochlear implants (CI) experience significant difficulties with the perception and production of prosody, and the role of possible top-down effects is therefore particularly relevant for this population. Design: Isolated disyllabic babble and first words were identified and segmented in longitudinal audio-video recordings and transcriptions for 9 NH infants and 9 infants with CI interacting with their parents. Monthly recordings were included from the onset of babbling until children had reached a cumulative vocabulary of 200 words. Three cues to prosodic prominence, F0, intensity and duration, were measured in the vocalic portions of stand-alone disyllables. In order to represent the degree of prosodic differentiation between two syllables in an utterance, the raw values for intensity and duration were transformed to ratios, and for f0 a measure of the perceptual distance in semitones was derived. The degree of prosodic differentiation for disyllabic babble and words for each cue was compared between groups. In addition, group and individual tendencies on the types of stress patterns for babble and words were also examined. Results: The CI group had overall smaller pitch and intensity distances than the NH group. For the NH group, words had greater pitch and intensity distances than babbled disyllables. Especially for pitch distance, this was accompanied by a shift towards a more clearly expressed stress pattern that reflected the influence of the ambient language. For the CI group, the same expansion in words did not take place for pitch. For intensity, the CI group gave evidence of some increase of prosodic differentiation. The results for the duration measure showed evidence of utterance-final lengthening in both groups. In words, the CI group significantly reduced durational differences between syllables so that a more even-timed, less differentiated pattern emerged. Conclusions: The onset of vocabulary production did not have the same facilitatory effect for the CI infants on the production of phonetic cues for prosody, especially for pitch. It was argued that the results for duration may reflect greater articulatory difficulties in words for the CI group than the NH group. It was suggested that the lack of clear top-down effects of the vocabulary in the CI group may be due to a lag in development caused by an initial lack of auditory stimulation, possibly compounded by the absence of auditory feedback during the babble phase

City Research Online

Crossref

Institutional Repository Universiteit Antwerpen