459 research outputs found

    Production of English Vowel Contrasts in Spanish L1 Learners: A Longitudinal Study

    Get PDF
    The present study undertakes a longitudinal examination of forty postgraduate students, native Spanish speakers, during their first year at a UK university. The research focuses on both individual and collective progress in mastering distinctions within English vowel pairs (/iː/-/ɪ/, /ɪ/-/e/, and /uː/-/ʊ/), with a specific focus on adaptations towards achieving native-like English vowel pronunciation, particularly in terms of vowel quality. Prior research indicates that adult Spanish learners encounter difficulties in mastering the intricate linguistic nuances presented by English. The methodology involved recording Spanish-speaking participants reading a list of words (CVC context) at three different time points over a year. The analysis was based on formant frequencies using Praat, and Euclidean distances were calculated to represent the degree of separation between each pair of vowels. Information about external factors potentially influencing the development of vowel productions among speakers was gathered through a language background questionnaire. The outcomes suggested varying rates of advancement within the group, which could be attributed to the diverse levels of exposure and interaction with native English speakers during their year of study in the UK. These results affirm the learning processes in adult L2 production, emphasizing the critical role played by both the quantity and quality of time in the assimilation of pronunciations to novel L2 segments

    Factors that affect generalization of adaptation

    Get PDF
    As there is a growing population of non-native speakers worldwide, facilitating communication involving native and non-native speakers has become increasingly important. While one way to help communication involving native and non-native speakers is to help non-native speakers improve proficiency in their target language, another way is to help native listeners better understand non-native speech. Specifically, while it may be initially difficult for native listeners to understand non-native speech, the listeners may become better at this skill after short training sessions (i.e., adaptation) and they may better understand novel non-native speakers (i.e., generalization). However, it is not well-understood how native listeners adapt and generalize to a novel speaker. This dissertation investigates how speaker and listener characteristics affect generalization to a novel speaker. Specifically, we examine how acoustic characteristics and talker information interact in generalization of adaptation, how accentedness of non-native speech affects generalization to a novel speaker, and how listeners’ linguistic experience affects generalization of adaptation. The results suggest that acoustic similarity between speakers may help generalization and that listeners’ reliance on talker information is down-weighted, as long as speakers that listeners are trained with and tested with have similar acoustic characteristics. Furthermore, the results show that exposure to more accented non-native speech disrupts generalization of adaptation compared to exposure to less accented non-native speech, suggesting that having exposure to non-native speakers does not always help generalization. The results also show that having extended linguistic experience with non-native speakers may disrupt generalization to a novel non-native speaker. The results of the present study have implications for how speaker- and listener-related factors affect generalization of adaptation. Specifically, we suggest that, at least in the early stages of learning, generalization of adaptation is constrained by acoustic similarity and that generalization to a non-native speaker utilizes mechanisms that are general to speech perception, rather than specific to this type of adaptation. We suggest that exposure to non-native accented speech that is too different from the speech that listeners are familiar with may disrupt generalization. Further, we suggest that the representation of non-native accents becomes less malleable with extended linguistic experience

    Comparing the production of a formula with the development of L2 competence

    Get PDF
    This pilot study investigates the production of a formula with the development of L2 competence over proficiency levels of a spoken learner corpus. The results show that the formula in beginner production data is likely being recalled holistically from learners’ phonological memory rather than generated online, identifiable by virtue of its fluent production in absence of any other surface structure evidence of the formula’s syntactic properties. As learners’ L2 competence increases, the formula becomes sensitive to modifications which show structural conformity at each proficiency level. The transparency between the formula’s modification and learners’ corresponding L2 surface structure realisations suggest that it is the independent development of L2 competence which integrates the formula into compositional language, and ultimately drives the SLA process forward

    ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSON’S DISEASE

    Get PDF
    Previous research has identified certain overarching features of hypokinetic dysarthria associated with Parkinson’s Disease and found it manifests differently between individuals. Acoustic analysis has often been used to find correlates of perceptual features for differential diagnosis. However, acoustic parameters that are robust for differential diagnosis may not be sensitive to tracking speech changes. Previous longitudinal studies have had limited sample sizes or variable lengths between data collection. This study focused on using acoustic correlates of perceptual features to identify acoustic markers able to track speech changes in people with Parkinson’s Disease (PwPD) over six months. The thesis presents how this study has addressed limitations of previous studies to make a novel contribution to current knowledge. Speech data was collected from 63 PwPD and 47 control speakers using an online podcast software at two time points, six months apart (T1 and T2). Recordings of a standard reading passage, minimal pairs, sustained phonation, and spontaneous speech were collected. Perceptual severity ratings were given by two speech and language therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody were investigated. Two analyses were conducted: a) to identify which acoustic parameters can track perceptual speech changes over time and b) to identify which acoustic parameters can track changes in speech intelligibility over time. An additional attempt was made to identify if these parameters showed group differences for differential diagnosis between PwPD and control speakers at T1 and T2. Results showed that specific acoustic parameters in voice quality, articulation and prosody could differentiate between PwPD and controls, or detect speech changes between T1 and T2, but not both factors. However, specific acoustic parameters within articulation could detect significant group and speech change differences across T1 and T2. The thesis discusses these results, their implications, and the potential for future studies

    An Investigation of Intelligibility and Lingua Franca Core Features in Indonesian Accented English

    Get PDF
    Recent approaches to teaching pronunciation of English in second or foreign language contexts have favoured the role of students’ L1 accents in the teaching and learning process with the emphasis on intelligibility and the use of English as a Lingua Franca rather than on achieving native like pronunciation. As far as English teaching in Indonesia is concerned, there is limited information on the intelligibility of Indonesian Accented English, as well as insufficient guidance on key pronunciation features for effective teaching. This research investigates features of Indonesian Accented English and critically assesses the intelligibility of different levels of Indonesian Accented English.English Speech data were elicited from 50 Indonesian speakers using reading texts. Key phonological features of Indonesian Accented English were investigated through acoustic analysis involving spectrographic observation using Praat Speech Analysis software. The intelligibility of different levels of Indonesian Accented English was measured using a transcription task performed by 24 native and non-native English listeners. The overall intelligibility of each accent was measured by examining the correctness of the transcriptions. The key pronunciation features which caused intelligibility failure were identified by analysing the incorrect transcriptions.The analysis of the key phonological features of Indonesian Accented English showed that while there was some degree of regularity in the production of vowel duration and consonant clusters, more individual variations were observed in segmental features particularly in the production of consonants /v, z, ʃ/ which are absent in the Indonesian phonemic inventory. The results of the intelligibility analysis revealed that although light and moderate accented speech data were significantly more intelligible than the heavier accented speech data, the native and non-native listeners did not have major problems with the intelligibility of Indonesian Accented English across the different accent levels. The analysis of incorrect transcriptions suggested that intelligibility failures were associated more with combined phonological miscues rather than a single factor. These results indicate that while Indonesian Accented English can be used effectively in international communication, it can also inform English language teaching in Indonesia

    Predictive Articulatory speech synthesis Utilizing Lexical Embeddings (PAULE)

    Get PDF
    Das Predictive Articulatory speech synthesis Utilizing Lexical Embeddings (PAULE) Modell ist ein neues Modell zur Kontrolle des artikulatorischen Sprachsynthesizers VocalTractLab (VTL) [15] . Mit PAULE lassen sich deutsche Wörter synthetisieren. Die Wortsynthese kann entweder mit Hilfe eines semantischen Vektors, der die Wortbedeu- tung kodiert, und der gewünschten Dauer der Wortsynthese gestartet werden oder es kann eine Resynthese von einer Audiodatei gemacht werden. Die Audiodatei kann beliebige Aufnahmen von Sprecher:innen enthalten, wobei die Resynthese immer über den Standardsprecher des VTL erfolgt. Abhängig von der Wortbedeutung und der Audiodatei variiert die Synthesequalität. Neu an PAULE ist, dass es einen prädiktiven Ansatz verwendet, indem es aus der geplanten Artikulation die dazugehörige perzeptuelle Akustik vorhersagt und daraus die Wortbedeutung ableitet. Sowohl die Akustik als auch die Wortbedeutung sind als metrische Vektorräume implementiert. Dadurch lässt sich ein Fehler zu einer gewünschten Zielakustik und Zielbedeutung berechnen und minimieren. Bei dem minimierten Fehler handelt es sich nicht um den tatsächlichen Fehler, der aus der Synthese mit dem VTL entsteht, sondern um den Fehler, der aus den Vorhersagen eines prädiktiven Modells generiert wird. Obwohl es nicht der tatsächliche Fehler ist, kann dieser Fehler genutzt werden, um die tatsächliche Artikulation zu verbessern. Um das prädiktive Modell mit der tatsächlichen Akustik in Einklang zu bringen, hört sich PAULE selbst zu. Ein in der Sprachsynthese zentrales Eins-Zu-Viele-Problem ist, dass eine Akustik durch viele verschiedene Artikulationen erzeugt werden kann. Dieses Eins-Zu-Viele-Problem wird durch die Vorhersagefehlerminimierung in PAULE aufgelöst, zusammen mit der Bedingung, dass die Artikulation möglichst stationär und mit möglichst konstanter Kraft ausgeführt wird. PAULE funktioniert ohne jegliche symbolische Repräsentation in der Akustik (Phoneme) und in der Artikulation (motorische Gesten oder Ziele). Damit zeigt PAULE, dass sich gesprochene Wörter ohne symbolische Beschreibungsebene model- lieren lassen. Der gesprochenen Sprache könnte daher im Vergleich zur geschriebenen Sprache eine fundamental andere Verarbeitungsebene zugrunde liegen. PAULE integriert Erfahrungswissen sukzessive. Damit findet PAULE nicht die global beste Artikulation sondern lokal gute Artikulationen. Intern setzt PAULE auf künstliche neuronale Netze und die damit verbundenen Gradienten, die zur Fehlerkorrektur verwendet werden. PAULE kann weder ganze Sätze synthetisieren noch wird somatosensorisches Feedback berücksichtigt. Zu Beidem gibt es Vorarbeiten, die in zukünftige Versionen integriert werden sollen.The Predictive Articulatory speech synthesis Utilizing Lexical Embeddings (PAULE) model is a new control model for the VocalTractLab (VTL) [15] speech synthesizer, a simulator of the human speech system. It is capable of synthesizing single words in the German language. The speech synthesis can be based on a target semantic vector or on target acoustics, i.e., a recorded word token. VTL is controlled by 30 parameters. These parameters have to be estimated for each time point during the production of a word, which is roughly every 2.5 milliseconds. The time-series of these 30 control parameters (cps) of the VTL are the control parameter trajectories (cp-trajectories). The high dimensionality of the cp-trajectories in combination with non-linear interactions leads to a many-to-one mapping problem, where many sets of cp-trajectories produce highly similar synthesized audio. PAULE solves this many-to-one mapping problem by anticipating the effects of cp- trajectories and minimizing a semantic and acoustic error between this nticipation and a targeted meaning and acoustics. The quality of the anticipation is improved by an outer loop, where PAULE listens to itself. PAULE has three central design features that distinguish it from other control models: First, PAULE does not use any symbolic units, neither motor primitives, articulatory targets, or gestural scores on the movement side, nor any phone or syllable representation on the acoustic side. Second, PAULE is a learning model that accumulates experience with articulated words. As a consequence, PAULE will not find a global optimum for the inverse kinematic optimization task it has to solve. Instead, it finds a local optimum that is conditioned on its past experience. Third, PAULE uses gradient-based internal prediction errors of a predictive forward model to plan cp-trajectories for a given semantic or acoustic target. Thus, PAULE is an error-driven model that takes its previous experiences into account. Pilot study results indicate that PAULE is able to minimize an acoustic semantic and acoustic error in the resynthesized audio. This allows PAULE to find cp-trajectories that are correctly classified by a classification model as the correct word with an accuracy of 60 %, which is close to the accuracy for human recordings of 63 %. Furthermore, PAULE seems to model vowel-to-vowel anticipatory coarticulation in terms of formant shifts correctly and can be compared to human electromagnetic articulography (EMA) recordings in a straightforward way. Furthermore, with PAULE it is possible to condition on already executed past cp-trajectories and to smoothly continue the cp-trajectories from the current state. As a side-effect of developing PAULE, it is possible to create large amounts of training data for the VTL through an automated segment-based approach. Next steps, in the development of PAULE, include adding a somatosensory feedback channel, extending PAULE from producing single words to the articulation of small utterances and adding a thorough evaluation

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

    Tungusic languages

    Get PDF
    Tungusic is a small family of languages, many of which are endangered. It encompasses approximately twenty languages located in Siberia and northern China. These languages are distributed over an enormous area that ranges from the Yenisey River and Xinjiang in the west to the Kamchatka Peninsula and Sakhalin in the east. They extend as far north as the Taimyr Peninsula and, for a brief period, could even be found in parts of Central and Southern China. This book is an attempt to bring researchers from different backgrounds together to provide an open-access publication in English that is freely available to all scholars in the field. The contributions cover all branches of Tungusic and a wide range of linguistic features. Topics include synchronic descriptions, typological comparisons, dialectology, language contact, and diachronic reconstruction. Some of the contributions are based on first-hand data collected during fieldwork, in some cases from the last speakers of a given language
    corecore