1,121 research outputs found

    Jaw Rotation in Dysarthria Measured With a Single Electromagnetic Articulography Sensor

    Get PDF
    Purpose This study evaluated a novel method for characterizing jaw rotation using orientation data from a single electromagnetic articulography sensor. This method was optimized for clinical application, and a preliminary examination of clinical feasibility and value was undertaken. Method The computational adequacy of the single-sensor orientation method was evaluated through comparisons of jaw-rotation histories calculated from dual-sensor positional data for 16 typical talkers. The clinical feasibility and potential value of single-sensor jaw rotation were assessed through comparisons of 7 talkers with dysarthria and 19 typical talkers in connected speech. Results The single-sensor orientation method allowed faster and safer participant preparation, required lower data-acquisition costs, and generated less high-frequency artifact than the dual-sensor positional approach. All talkers with dysarthria, regardless of severity, demonstrated jaw-rotation histories with more numerous changes in movement direction and reduced smoothness compared with typical talkers. Conclusions Results suggest that the single-sensor orientation method for calculating jaw rotation during speech is clinically feasible. Given the preliminary nature of this study and the small participant pool, the clinical value of such measures remains an open question. Further work must address the potential confound of reduced speaking rate on movement smoothness

    Plain-to-clear speech video conversion for enhanced intelligibility

    Get PDF
    Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies

    Recognizing prosody from the lips: is it possible to extract prosodic focus from lip features?

    Get PDF
    International audienceThe aim of this chapter is to examine the possibility of extracting prosodic information from lip features. We used two measurement techniques enabling automatic lip feature extraction to evaluate the "lip pattern" of prosodic focus in French. Two corpora with Subject-Verb-Object (SVO) sentences were designed. Four focus conditions (S, V, O or neutral) were elicited in a natural dialogue situation. In a first set of experiments, we recorded two speakers of French with front and profile video cameras. The speakers wore blue make-up and facial markers. In a second set we recorded five speakers with a 3D optical tracker. An analysis of the lip features showed that visible articulatory lip correlates of focus exist for all speakers. Two types of patterns were observed: absolute and differential. A potential outcome of this study is to provide criteria for automatic visual detection of prosodic focus from lip data

    The Application of Clear Speech in Electrolaryngeal Speakers

    Get PDF
    The present work was comprised of a series of experiments that investigated the application of clear speech (CS) in a group of electrolaryngeal (EL) speakers. Three experiments were conducted to assess the impact of CS on three important aspects of EL speech. More specifically, Experiment 1 sought to identify the impact of CS on EL speakers’ word and consonant intelligibility; Experiment 2 examined the influence of CS on the acoustic characteristics of words and vowels in EL speech; and finally, Experiment 3 sought to identify the influence of CS produced by EL speakers on auditory-perceptual ratings by naïve listeners. Results revealed that overall word and consonant intelligibility were minimally different when EL speakers used CS compared to their everyday, ‘habitual’ speech (HS) (Experiment 1). Secondly, EL speakers’ use of CS significantly increased word durations, but did not have a substantial impact on fundamental and formant frequency characteristics of vowels (Experiment 2). Finally, due to the productive changes associated with CS involving a slower rate of speech, over-articulation, and increased mouth-opening, listeners judged EL speech to be significantly less acceptable to listen to when compared to HS. However, no significant effect of speaking condition was noted on listeners’ comfort levels (Experiment 3). Overall, findings suggest that the acoustic deficits in EL speech might be too complex to derive further benefit from CS in the areas of speech intelligibility, the acoustic structure of EL speech and/or auditory-perceptual ratings of EL speakers. Clinical implications and future directions for research are discussed

    Reduktion in natĂŒrlicher Sprache

    Get PDF
    Natural (conversational) speech, compared to cannonical speech, is earmarked by the tremendous amount of variation that often leads to a massive change in pronunciation. Despite many attempts to explain and theorize the variability in conversational speech, its unique characteristics have not played a significant role in linguistic modeling. One of the reasons for variation in natural speech lies in a tendency of speakers to reduce speech, which may drastically alter the phonetic shape of words. Despite the massive loss of information due to reduction, listeners are often able to understand conversational speech even in the presence of background noise. This dissertation investigates two reduction processes, namely regressive place assimilation across word boundaries, and massive reduction and provides novel data from the analyses of speech corpora combined with experimental results from perception studies to reach a better understanding of how humans handle natural speech. The successes and failures of two models dealing with data from natural speech are presented: The FUL-model (Featurally Underspecified Lexicon, Lahiri & Reetz, 2002), and X-MOD (an episodic model, Johnson, 1997). Based on different assumptions, both models make different predictions for the two types of reduction processes under investigation. This dissertation explores the nature and dynamics of these processes in speech production and discusses its consequences for speech perception. More specifically, data from analyses of running speech are presented investigating the amount of reduction that occurs in naturally spoken German. Concerning production, the corpus analysis of regressive place assimilation reveals that it is not an obligatory process. At the same time, there emerges a clear asymmetry: With only very few exceptions, only [coronal] segments undergo assimilation, [labial] and [dorsal] segments usually do not. Furthermore, there seem to be cases of complete neutralization where the underlying Place of Articulation feature has undergone complete assimilation to the Place of Articulation feature of the upcoming segment. Phonetic analyses further underpin these findings. Concerning deletions and massive reductions, the results clearly indicate that phonological rules in the classical generative tradition are not able to explain the reduction patterns attested in conversational speech. Overall, the analyses of deletion and massive reduction in natural speech did not exhibit clear-cut patterns. For a more in-depth examination of reduction factors, the case of final /t/ deletion is examined by means of a new corpus constructed for this purpose. The analysis of this corpus indicates that although phonological context plays an important role on the deletion of segments (i.e. /t/), this arises in the form of tendencies, not absolute conditions. This is true for other deletion processes, too. Concerning speech perception, a crucial part for both models under investigation (X-MOD and FUL) is how listeners handle reduced speech. Five experiments investigate the way reduced speech is perceived by human listeners. Results from two experiments show that regressive place assimilations can be treated as instances of complete neutralizations by German listeners. Concerning massively reduced words, the outcome of transcription and priming experiments suggest that such words are not acceptable candidates of the intended lexical items for listeners in the absence of their proper phrasal context. Overall, the abstractionist FUL-model is found to be superior in explaining the data. While at first sight, X-MOD deals with the production data more readily, FUL provides a better fit for the perception results. Another important finding concerns the role of phonology and phonetics in general. The results presented in this dissertation make a strong case for models, such as FUL, where phonology and phonetics operate at different levels of the mental lexicon, rather than being integrated into one. The findings suggest that phonetic variation is not part of the representation in the mental lexicon.NatĂŒrliche (spontane) Sprache in Dialogen zeichnet sich, im Vergleich zu kanonischer Sprache, vor allem durch das enorme Ausmaß an Variation aus. Diese kann oft dazu fĂŒhren, dass Wörter in der Aussprache massiv verĂ€ndert werden. Trotz einiger BemĂŒhungen, VariabilitĂ€t in natĂŒrlicher Sprache zu erklĂ€ren und theoretisch zu fassen, haben die einzigartigen Merkmale natĂŒrlicher Sprache kaum Eingang in linguistische Modelle gefunden. Einer der GrĂŒnde, warum Variation in natĂŒrlicher Sprache zu beobachten ist, liegt in der Tendenz der Sprecher, Sprache zu reduzieren. Dies kann die phonetische Gestalt von Wörtern drastisch beeinflussen. Obwohl hierdurch massiv Information durch Reduktion verloren geht, sind Hörer oft in der Lage Spontansprache zu verstehen, sogar, wenn HintergrundgerĂ€usche dies erschweren. Diese Dissertation untersucht zwei Reduktionsprozesse: Regressive Assimilation des Artikulationsortes ĂŒber Wortgrenzen hinweg und Massive Reduktion. Es werden neue Daten prĂ€sentiert, die durch die Analysen von Sprachkorpora gewonnen wurden. Außerdem stehen experimentelle Ergebnisse von Perzeptionsstudien im Mittelpunkt, die helfen sollen, besser zu verstehen, wie Menschen mit natĂŒrlicher Sprache umgehen. Die Dissertation zeigt die Erfolge und Probleme von zwei Modellen im Umgang mit Daten von natĂŒrlicher Sprache auf: Das FUL-Modell (Featurally Underspecified Lexicon , Lahiri & Reetz, 2002), und X-MOD (ein episodisches Modell, Johnson, 1997). Aufgrund unterschiedlicher Annahmen machen die zwei Modelle verschiedene Vorhersagen fĂŒr die beiden Reduktionsprozesse, die in dieser Dissertation untersucht werden. Es werden Art und Auswirkungen der beiden Prozesse fĂŒr Sprachproduktion untersucht und die Konsequenzen fĂŒr das Sprachverstehen beleuchtet. Was die Sprachproduktion betrifft, so zeigt eine Korpusanalyse von natĂŒrlich gesprochenem Deutsch, dass der Reduktionsprozess regressive Assimilation des Artikulationsortes nicht obligatorisch statt findet. Gleichzeitig wird eine hervorstechende Asymmetrie deutlich: Abgesehen von einigen wenigen Ausnahmen werden ausschließlich [koronale] Segmente assimiliert, [labiale] und [dorsale] Segmente normalerweise nicht. Außerdem, so legen die Produktionsdaten nahe, gibt es FĂ€lle, in denen die Assimilation des Artikulationsortes an den Artikulationsort des Folgesegmentes komplett ist, also eine vollstĂ€ndige Neutralisierung der Merkmalskontraste vom Sprecher vorgenommen wurde. Phonetische Analysen bestĂ€tigen dieses Resultat. Im Fall von Löschungen und massiven Reduktion demonstrieren die Ergebnisse eindeutig, dass phonologische Regeln – im klassischen generativen Sinne – nicht in der Lage sind, die Reduktionsmuster zu beschreiben, die in Spontansprache vorkommen. Alles in allem zeigen die Analysen von massiven Reduktionen und Löschungen keine eindeutigen Muster auf. Um einzelne Faktoren, die Reduktionen beeinflussen, genauer untersuchen zu können, wurde die Löschung von (Wort) finalem /t/ anhand eines neuen, fĂŒr diesen Zweck kreierten Korpus durchgefĂŒhrt. Die Analyse dieses Korpus unterstreicht, dass, obwohl phonologischer Kontext eine gewichtigen Einfluss darauf hat, ob Segmente (d.h. /t/) gelöscht werden, dieser Einfluss eher als Tendenz verstanden werden muss, nicht als absolute Bedingung. Dieses Resultat trifft auch auf andere Löschungsprozesse zu. Beide Modelle (X-MOD und FUL), die in dieser Dissertation untersucht werden, gehen im Kern der Frage nach, wie Hörer Sprache verstehen. FĂŒnf Experimente untersuchen, wie reduzierte Sprache von menschlichen Hörern wahrgenommen wird. Ergebnisse von zwei Studien zeigen, dass Assimilationen von deutschen Hörern durchaus als komplett neutralisiert wahrgenommen werden. Was die Perzeption von massiv reduzierten Wörtern betrifft, belegen die Resultate von Transkriptionsstudien und Priming-Experimenten, dass solche Wörter nicht als Wortkandidaten fĂŒr die korrekten lexikalischen EintrĂ€ge akzeptiert werden, wenn sie ohne ihren Satz-Kontext dargeboten werden. Insgesamt ist das abstraktionistische FUL-Modell besser in der Lage, die Daten zu erklĂ€ren, die in dieser Dissertation prĂ€sentiert werden. Auf den ersten Blick scheint X-MOD zwar etwas besser geeignet, die Produktionsdaten zu erklĂ€ren, hauptsĂ€chlich jedoch, weil Variation als Grundannahme im Modell verankert ist. FUL ist klar ĂŒberlegen, was die Perzeptionsseite betrifft. Ein weiteres wichtiges Ergebnis dieser Dissertation ist die Rolle, die Phonologie und Phonetik im Allgemeinen zugedacht werden kann. Die Resultate, die hier vorgestellt werden, liefern starke Argumente fĂŒr Modelle – wie z.B. FUL – in denen Phonologie und Phonetik auf verschiedenen Ebenen des mentalen Lexikons aktiv sind und nicht in einem integriert sind. Die Befunde legen nahe, dass phonetische Variation nicht Teil der ReprĂ€sentation im mentalen Lexikon ist

    Acoustic Characteristics of Tense and Lax Vowels Across Sentence Position in Clear Speech

    Get PDF
    The purpose of this study was to examine the acoustic characteristics of tense and lax vowels across sentence positions in clear speech. Recordings were made of 12 participants reading monosyllabic target words at varying positions within semantically meaningful sentences. Acoustic analysis was completed to determine the effects of Style (clear vs. conversational), Tenseness (tense vs. lax), and Position (sentence-medial vs. sentence-final) on vowel duration, vowel space area, vowel space dispersion, and vowel peripheralization. The results showed speakers had longer durations and expanded vowel spaces in clear speech for both tense and lax vowels. Importantly, the amount of increase was similar for tense and lax vowels suggesting the defining properties of lax vowels (i.e., short duration and centralization) were manipulated in clear speech. A significant main effect of position for lax vowel space expansion showed greater vowel spaces for lax vowels in sentence-medial position in clear speech. Clear speech vowel adaptations appear to be dynamic with both vowel-specific and general transformations

    The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review

    Get PDF
    Published online: Aug 1, 2022Purpose: This scoping review considers the acoustic features of a clear speech register directed to nonnative listeners known as foreigner-directed speech (FDS). We identify vowel hyperarticulation and low speech rate as the most representative acoustic features of FDS; other features, including wide pitch range and high intensity, are still under debate. We also discuss factors that may influence the outcomes and characteristics of FDS. We start by examining accommodation theories, outlining the reasons why FDS is likely to serve a didactic function by helping listeners acquire a second language (L2). We examine how this speech register adapts to listeners’ identities and linguistic needs, suggesting that FDS also takes listeners’ L2 proficiency into account. To confirm the didactic function of FDS, we compare it to other clear speech registers, specifically infant-directed speech and Lombard speech. Conclusions: Our review reveals that research has not yet established whether FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex set of factors determines specific realizations of FDS, which need further exploration. We conclude by summarizing open questions and indicating directions and recommendations for future research.This research was supported by a Doctoral Fellowship (LCF/BQ/DI19/11730045) from “La Caixa” Foundation (ID 100010434) awarded to Giorgio Piazza and by the Spanish Ministry of Science and Innovation through the Ramon y Cajal Research Fellowship (RYC2018-024284-I) awarded to Marina Kalashnikova. This research was supported by the Basque Government through the BERC 2022-2025 program and by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation CEX2020-001010-S. This research was also supported by the Spanish Ministry of Economy and Competitiveness (PID2020-113926GB-I00 awarded to Clara D. Martin) and by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement 819093 awarded to Clara D. Martin)

    The early phase of /Éč/ production development in adult Japanese learners of English

    Get PDF
    Although previous research indicates that Japanese speakers’ second-language (L2) perception and production of English /Éč/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /Éč/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /Éč/ from 60 Japanese learners with a length of residence (LOR) between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 low-proficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditions—word reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is task-dependent and is influenced by the availability of L1 phonetic cues for redeployment in L2
