1,121 research outputs found
Jaw Rotation in Dysarthria Measured With a Single Electromagnetic Articulography Sensor
Purpose This study evaluated a novel method for characterizing jaw rotation using orientation data from a single electromagnetic articulography sensor. This method was optimized for clinical application, and a preliminary examination of clinical feasibility and value was undertaken.
Method The computational adequacy of the single-sensor orientation method was evaluated through comparisons of jaw-rotation histories calculated from dual-sensor positional data for 16 typical talkers. The clinical feasibility and potential value of single-sensor jaw rotation were assessed through comparisons of 7 talkers with dysarthria and 19 typical talkers in connected speech.
Results The single-sensor orientation method allowed faster and safer participant preparation, required lower data-acquisition costs, and generated less high-frequency artifact than the dual-sensor positional approach. All talkers with dysarthria, regardless of severity, demonstrated jaw-rotation histories with more numerous changes in movement direction and reduced smoothness compared with typical talkers.
Conclusions Results suggest that the single-sensor orientation method for calculating jaw rotation during speech is clinically feasible. Given the preliminary nature of this study and the small participant pool, the clinical value of such measures remains an open question. Further work must address the potential confound of reduced speaking rate on movement smoothness
Plain-to-clear speech video conversion for enhanced intelligibility
Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talkerâs visual speech style; (3) we introduce âdisplacement factorâ as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies
Recognizing prosody from the lips: is it possible to extract prosodic focus from lip features?
International audienceThe aim of this chapter is to examine the possibility of extracting prosodic information from lip features. We used two measurement techniques enabling automatic lip feature extraction to evaluate the "lip pattern" of prosodic focus in French. Two corpora with Subject-Verb-Object (SVO) sentences were designed. Four focus conditions (S, V, O or neutral) were elicited in a natural dialogue situation. In a first set of experiments, we recorded two speakers of French with front and profile video cameras. The speakers wore blue make-up and facial markers. In a second set we recorded five speakers with a 3D optical tracker. An analysis of the lip features showed that visible articulatory lip correlates of focus exist for all speakers. Two types of patterns were observed: absolute and differential. A potential outcome of this study is to provide criteria for automatic visual detection of prosodic focus from lip data
The Application of Clear Speech in Electrolaryngeal Speakers
The present work was comprised of a series of experiments that investigated the application of clear speech (CS) in a group of electrolaryngeal (EL) speakers. Three experiments were conducted to assess the impact of CS on three important aspects of EL speech. More specifically, Experiment 1 sought to identify the impact of CS on EL speakersâ word and consonant intelligibility; Experiment 2 examined the influence of CS on the acoustic characteristics of words and vowels in EL speech; and finally, Experiment 3 sought to identify the influence of CS produced by EL speakers on auditory-perceptual ratings by naĂŻve listeners. Results revealed that overall word and consonant intelligibility were minimally different when EL speakers used CS compared to their everyday, âhabitualâ speech (HS) (Experiment 1). Secondly, EL speakersâ use of CS significantly increased word durations, but did not have a substantial impact on fundamental and formant frequency characteristics of vowels (Experiment 2). Finally, due to the productive changes associated with CS involving a slower rate of speech, over-articulation, and increased mouth-opening, listeners judged EL speech to be significantly less acceptable to listen to when compared to HS. However, no significant effect of speaking condition was noted on listenersâ comfort levels (Experiment 3). Overall, findings suggest that the acoustic deficits in EL speech might be too complex to derive further benefit from CS in the areas of speech intelligibility, the acoustic structure of EL speech and/or auditory-perceptual ratings of EL speakers. Clinical implications and future directions for research are discussed
Reduktion in natĂŒrlicher Sprache
Natural (conversational) speech, compared to cannonical speech, is earmarked by the tremendous amount of variation that often leads to a massive change in pronunciation. Despite many attempts to explain and theorize the variability in conversational speech, its unique characteristics have not played a significant role in linguistic modeling. One of the reasons for variation in natural speech lies in a tendency of speakers to reduce speech, which may drastically alter the phonetic shape of words. Despite the massive loss of information due to reduction, listeners are often able to understand conversational speech even in the presence of background noise. This dissertation investigates two reduction processes, namely regressive place assimilation across word boundaries, and massive reduction and provides novel data from the analyses of speech corpora combined with experimental results from perception studies to reach a better understanding of how humans handle natural speech. The successes and failures of two models dealing with data from natural speech are presented: The FUL-model (Featurally Underspecified Lexicon, Lahiri & Reetz, 2002), and X-MOD (an episodic model, Johnson, 1997). Based on different assumptions, both models make different predictions for the two types of reduction processes under investigation. This dissertation explores the nature and dynamics of these processes in speech production and discusses its consequences for speech perception. More specifically, data from analyses of running speech are presented investigating the amount of reduction that occurs in naturally spoken German. Concerning production, the corpus analysis of regressive place assimilation reveals that it is not an obligatory process. At the same time, there emerges a clear asymmetry: With only very few exceptions, only [coronal] segments undergo assimilation, [labial] and [dorsal] segments usually do not. Furthermore, there seem to be cases of complete neutralization where the underlying Place of Articulation feature has undergone complete assimilation to the Place of Articulation feature of the upcoming segment. Phonetic analyses further underpin these findings. Concerning deletions and massive reductions, the results clearly indicate that phonological rules in the classical generative tradition are not able to explain the reduction patterns attested in conversational speech. Overall, the analyses of deletion and massive reduction in natural speech did not exhibit clear-cut patterns. For a more in-depth examination of reduction factors, the case of final /t/ deletion is examined by means of a new corpus constructed for this purpose. The analysis of this corpus indicates that although phonological context plays an important role on the deletion of segments (i.e. /t/), this arises in the form of tendencies, not absolute conditions. This is true for other deletion processes, too. Concerning speech perception, a crucial part for both models under investigation (X-MOD and FUL) is how listeners handle reduced speech. Five experiments investigate the way reduced speech is perceived by human listeners. Results from two experiments show that regressive place assimilations can be treated as instances of complete neutralizations by German listeners. Concerning massively reduced words, the outcome of transcription and priming experiments suggest that such words are not acceptable candidates of the intended lexical items for listeners in the absence of their proper phrasal context. Overall, the abstractionist FUL-model is found to be superior in explaining the data. While at first sight, X-MOD deals with the production data more readily, FUL provides a better fit for the perception results. Another important finding concerns the role of phonology and phonetics in general. The results presented in this dissertation make a strong case for models, such as FUL, where phonology and phonetics operate at different levels of the mental lexicon, rather than being integrated into one. The findings suggest that phonetic variation is not part of the representation in the mental lexicon.NatĂŒrliche (spontane) Sprache in Dialogen zeichnet sich, im Vergleich zu kanonischer Sprache, vor allem durch das enorme AusmaĂ an Variation aus. Diese kann oft dazu fĂŒhren, dass Wörter in der Aussprache massiv verĂ€ndert werden. Trotz einiger BemĂŒhungen, VariabilitĂ€t in natĂŒrlicher Sprache zu erklĂ€ren und theoretisch zu fassen, haben die einzigartigen Merkmale natĂŒrlicher Sprache kaum Eingang in linguistische Modelle gefunden. Einer der GrĂŒnde, warum Variation in natĂŒrlicher Sprache zu beobachten ist, liegt in der Tendenz der Sprecher, Sprache zu reduzieren. Dies kann die phonetische Gestalt von Wörtern drastisch beeinflussen. Obwohl hierdurch massiv Information durch Reduktion verloren geht, sind Hörer oft in der Lage Spontansprache zu verstehen, sogar, wenn HintergrundgerĂ€usche dies erschweren. Diese Dissertation untersucht zwei Reduktionsprozesse: Regressive Assimilation des Artikulationsortes ĂŒber Wortgrenzen hinweg und Massive Reduktion. Es werden neue Daten prĂ€sentiert, die durch die Analysen von Sprachkorpora gewonnen wurden. AuĂerdem stehen experimentelle Ergebnisse von Perzeptionsstudien im Mittelpunkt, die helfen sollen, besser zu verstehen, wie Menschen mit natĂŒrlicher Sprache umgehen. Die Dissertation zeigt die Erfolge und Probleme von zwei Modellen im Umgang mit Daten von natĂŒrlicher Sprache auf: Das FUL-Modell (Featurally Underspecified Lexicon , Lahiri & Reetz, 2002), und X-MOD (ein episodisches Modell, Johnson, 1997). Aufgrund unterschiedlicher Annahmen machen die zwei Modelle verschiedene Vorhersagen fĂŒr die beiden Reduktionsprozesse, die in dieser Dissertation untersucht werden. Es werden Art und Auswirkungen der beiden Prozesse fĂŒr Sprachproduktion untersucht und die Konsequenzen fĂŒr das Sprachverstehen beleuchtet. Was die Sprachproduktion betrifft, so zeigt eine Korpusanalyse von natĂŒrlich gesprochenem Deutsch, dass der Reduktionsprozess regressive Assimilation des Artikulationsortes nicht obligatorisch statt findet. Gleichzeitig wird eine hervorstechende Asymmetrie deutlich: Abgesehen von einigen wenigen Ausnahmen werden ausschlieĂlich [koronale] Segmente assimiliert, [labiale] und [dorsale] Segmente normalerweise nicht. AuĂerdem, so legen die Produktionsdaten nahe, gibt es FĂ€lle, in denen die Assimilation des Artikulationsortes an den Artikulationsort des Folgesegmentes komplett ist, also eine vollstĂ€ndige Neutralisierung der Merkmalskontraste vom Sprecher vorgenommen wurde. Phonetische Analysen bestĂ€tigen dieses Resultat. Im Fall von Löschungen und massiven Reduktion demonstrieren die Ergebnisse eindeutig, dass phonologische Regeln â im klassischen generativen Sinne â nicht in der Lage sind, die Reduktionsmuster zu beschreiben, die in Spontansprache vorkommen. Alles in allem zeigen die Analysen von massiven Reduktionen und Löschungen keine eindeutigen Muster auf. Um einzelne Faktoren, die Reduktionen beeinflussen, genauer untersuchen zu können, wurde die Löschung von (Wort) finalem /t/ anhand eines neuen, fĂŒr diesen Zweck kreierten Korpus durchgefĂŒhrt. Die Analyse dieses Korpus unterstreicht, dass, obwohl phonologischer Kontext eine gewichtigen Einfluss darauf hat, ob Segmente (d.h. /t/) gelöscht werden, dieser Einfluss eher als Tendenz verstanden werden muss, nicht als absolute Bedingung. Dieses Resultat trifft auch auf andere Löschungsprozesse zu. Beide Modelle (X-MOD und FUL), die in dieser Dissertation untersucht werden, gehen im Kern der Frage nach, wie Hörer Sprache verstehen. FĂŒnf Experimente untersuchen, wie reduzierte Sprache von menschlichen Hörern wahrgenommen wird. Ergebnisse von zwei Studien zeigen, dass Assimilationen von deutschen Hörern durchaus als komplett neutralisiert wahrgenommen werden. Was die Perzeption von massiv reduzierten Wörtern betrifft, belegen die Resultate von Transkriptionsstudien und Priming-Experimenten, dass solche Wörter nicht als Wortkandidaten fĂŒr die korrekten lexikalischen EintrĂ€ge akzeptiert werden, wenn sie ohne ihren Satz-Kontext dargeboten werden. Insgesamt ist das abstraktionistische FUL-Modell besser in der Lage, die Daten zu erklĂ€ren, die in dieser Dissertation prĂ€sentiert werden. Auf den ersten Blick scheint X-MOD zwar etwas besser geeignet, die Produktionsdaten zu erklĂ€ren, hauptsĂ€chlich jedoch, weil Variation als Grundannahme im Modell verankert ist. FUL ist klar ĂŒberlegen, was die Perzeptionsseite betrifft. Ein weiteres wichtiges Ergebnis dieser Dissertation ist die Rolle, die Phonologie und Phonetik im Allgemeinen zugedacht werden kann. Die Resultate, die hier vorgestellt werden, liefern starke Argumente fĂŒr Modelle â wie z.B. FUL â in denen Phonologie und Phonetik auf verschiedenen Ebenen des mentalen Lexikons aktiv sind und nicht in einem integriert sind. Die Befunde legen nahe, dass phonetische Variation nicht Teil der ReprĂ€sentation im mentalen Lexikon ist
Acoustic Characteristics of Tense and Lax Vowels Across Sentence Position in Clear Speech
The purpose of this study was to examine the acoustic characteristics of tense and lax vowels across sentence positions in clear speech. Recordings were made of 12 participants reading monosyllabic target words at varying positions within semantically meaningful sentences. Acoustic analysis was completed to determine the effects of Style (clear vs. conversational), Tenseness (tense vs. lax), and Position (sentence-medial vs. sentence-final) on vowel duration, vowel space area, vowel space dispersion, and vowel peripheralization. The results showed speakers had longer durations and expanded vowel spaces in clear speech for both tense and lax vowels. Importantly, the amount of increase was similar for tense and lax vowels suggesting the defining properties of lax vowels (i.e., short duration and centralization) were manipulated in clear speech. A significant main effect of position for lax vowel space expansion showed greater vowel spaces for lax vowels in sentence-medial position in clear speech. Clear speech vowel adaptations appear to be dynamic with both vowel-specific and general transformations
The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review
Published online: Aug 1, 2022Purpose: This scoping review considers the acoustic features of a clear
speech register directed to nonnative listeners known as foreigner-directed
speech (FDS). We identify vowel hyperarticulation and low speech rate as the
most representative acoustic features of FDS; other features, including wide
pitch range and high intensity, are still under debate. We also discuss factors
that may influence the outcomes and characteristics of FDS. We start by
examining accommodation theories, outlining the reasons why FDS is likely
to serve a didactic function by helping listeners acquire a second language
(L2). We examine how this speech register adapts to listenersâ identities and
linguistic needs, suggesting that FDS also takes listenersâ L2 proficiency into
account. To confirm the didactic function of FDS, we compare it to other
clear speech registers, specifically infant-directed speech and Lombard
speech.
Conclusions: Our review reveals that research has not yet established whether
FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex
set of factors determines specific realizations of FDS, which need further
exploration. We conclude by summarizing open questions and indicating directions
and recommendations for future research.This research was supported by a Doctoral Fellowship
(LCF/BQ/DI19/11730045) from âLa Caixaâ Foundation
(ID 100010434) awarded to Giorgio Piazza and by the
Spanish Ministry of Science and Innovation through the
Ramon y Cajal Research Fellowship (RYC2018-024284-I)
awarded to Marina Kalashnikova. This research was supported
by the Basque Government through the BERC
2022-2025 program and by the Spanish State Research
Agency through BCBL Severo Ochoa excellence accreditation
CEX2020-001010-S. This research was also supported
by the Spanish Ministry of Economy and Competitiveness
(PID2020-113926GB-I00 awarded to Clara D. Martin)
and by the European Research Council under the European
Unionâs Horizon 2020 research and innovation programme
(Grant Agreement 819093 awarded to Clara D.
Martin)
The early phase of /Éč/ production development in adult Japanese learners of English
Although previous research indicates that Japanese speakersâ second-language (L2) perception and production of English /Éč/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /Éč/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /Éč/ from 60 Japanese learners with a length of residence (LOR) between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 low-proficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditionsâword reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is task-dependent and is influenced by the availability of L1 phonetic cues for redeployment in L2
Recommended from our members
Expansion of prosodic abilities at the transition from babble to words: a comparison between children with cochlear implants and normally hearing children
Objectives: This longitudinal study examined the impact of emerging vocabulary production on the ability to produce the phonetic cues to prosodic prominence in babbled and lexical disyllables of infants with Cochlear Implants (CI) and normally hearing infants (NH). Current research on typical language acquisition emphasizes the importance of vocabulary development for phonological and phonetic acquisition. Children with cochlear implants (CI) experience significant difficulties with the perception and production of prosody, and the role of possible top-down effects is therefore particularly relevant for this population.
Design: Isolated disyllabic babble and first words were identified and segmented in longitudinal audio-video recordings and transcriptions for 9 NH infants and 9 infants with CI interacting with their parents. Monthly recordings were included from the onset of babbling until children had reached a cumulative vocabulary of 200 words. Three cues to prosodic prominence, F0, intensity and duration, were measured in the vocalic portions of stand-alone disyllables. In order to represent the degree of prosodic differentiation between two syllables in an utterance, the raw values for intensity and duration were transformed to ratios, and for f0 a measure of the perceptual distance in semitones was derived. The degree of prosodic differentiation for disyllabic babble and words for each cue was compared between groups. In addition, group and individual tendencies on the types of stress patterns for babble and words were also examined.
Results: The CI group had overall smaller pitch and intensity distances than the NH group. For the NH group, words had greater pitch and intensity distances than babbled disyllables. Especially for pitch distance, this was accompanied by a shift towards a more clearly expressed stress pattern that reflected the influence of the ambient language. For the CI group, the same expansion in words did not take place for pitch. For intensity, the CI group gave evidence of some increase of prosodic differentiation. The results for the duration measure showed evidence of utterance-final lengthening in both groups. In words, the CI group significantly reduced durational differences between syllables so that a more even-timed, less differentiated pattern emerged.
Conclusions: The onset of vocabulary production did not have the same facilitatory effect for the CI infants on the production of phonetic cues for prosody, especially for pitch. It was argued that the results for duration may reflect greater articulatory difficulties in words for the CI group than the NH group. It was suggested that the lack of clear top-down effects of the vocabulary in the CI group may be due to a lag in development caused by an initial lack of auditory stimulation, possibly compounded by the absence of auditory feedback during the babble phase
- âŠ