563 research outputs found

    Production of English Vowel Contrasts in Spanish L1 Learners: A Longitudinal Study

    Get PDF
    The present study undertakes a longitudinal examination of forty postgraduate students, native Spanish speakers, during their first year at a UK university. The research focuses on both individual and collective progress in mastering distinctions within English vowel pairs (/iː/-/ɪ/, /ɪ/-/e/, and /uː/-/ʊ/), with a specific focus on adaptations towards achieving native-like English vowel pronunciation, particularly in terms of vowel quality. Prior research indicates that adult Spanish learners encounter difficulties in mastering the intricate linguistic nuances presented by English. The methodology involved recording Spanish-speaking participants reading a list of words (CVC context) at three different time points over a year. The analysis was based on formant frequencies using Praat, and Euclidean distances were calculated to represent the degree of separation between each pair of vowels. Information about external factors potentially influencing the development of vowel productions among speakers was gathered through a language background questionnaire. The outcomes suggested varying rates of advancement within the group, which could be attributed to the diverse levels of exposure and interaction with native English speakers during their year of study in the UK. These results affirm the learning processes in adult L2 production, emphasizing the critical role played by both the quantity and quality of time in the assimilation of pronunciations to novel L2 segments

    Factors that affect generalization of adaptation

    Get PDF
    As there is a growing population of non-native speakers worldwide, facilitating communication involving native and non-native speakers has become increasingly important. While one way to help communication involving native and non-native speakers is to help non-native speakers improve proficiency in their target language, another way is to help native listeners better understand non-native speech. Specifically, while it may be initially difficult for native listeners to understand non-native speech, the listeners may become better at this skill after short training sessions (i.e., adaptation) and they may better understand novel non-native speakers (i.e., generalization). However, it is not well-understood how native listeners adapt and generalize to a novel speaker. This dissertation investigates how speaker and listener characteristics affect generalization to a novel speaker. Specifically, we examine how acoustic characteristics and talker information interact in generalization of adaptation, how accentedness of non-native speech affects generalization to a novel speaker, and how listeners’ linguistic experience affects generalization of adaptation. The results suggest that acoustic similarity between speakers may help generalization and that listeners’ reliance on talker information is down-weighted, as long as speakers that listeners are trained with and tested with have similar acoustic characteristics. Furthermore, the results show that exposure to more accented non-native speech disrupts generalization of adaptation compared to exposure to less accented non-native speech, suggesting that having exposure to non-native speakers does not always help generalization. The results also show that having extended linguistic experience with non-native speakers may disrupt generalization to a novel non-native speaker. The results of the present study have implications for how speaker- and listener-related factors affect generalization of adaptation. Specifically, we suggest that, at least in the early stages of learning, generalization of adaptation is constrained by acoustic similarity and that generalization to a non-native speaker utilizes mechanisms that are general to speech perception, rather than specific to this type of adaptation. We suggest that exposure to non-native accented speech that is too different from the speech that listeners are familiar with may disrupt generalization. Further, we suggest that the representation of non-native accents becomes less malleable with extended linguistic experience

    ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSON’S DISEASE

    Get PDF
    Previous research has identified certain overarching features of hypokinetic dysarthria associated with Parkinson’s Disease and found it manifests differently between individuals. Acoustic analysis has often been used to find correlates of perceptual features for differential diagnosis. However, acoustic parameters that are robust for differential diagnosis may not be sensitive to tracking speech changes. Previous longitudinal studies have had limited sample sizes or variable lengths between data collection. This study focused on using acoustic correlates of perceptual features to identify acoustic markers able to track speech changes in people with Parkinson’s Disease (PwPD) over six months. The thesis presents how this study has addressed limitations of previous studies to make a novel contribution to current knowledge. Speech data was collected from 63 PwPD and 47 control speakers using an online podcast software at two time points, six months apart (T1 and T2). Recordings of a standard reading passage, minimal pairs, sustained phonation, and spontaneous speech were collected. Perceptual severity ratings were given by two speech and language therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody were investigated. Two analyses were conducted: a) to identify which acoustic parameters can track perceptual speech changes over time and b) to identify which acoustic parameters can track changes in speech intelligibility over time. An additional attempt was made to identify if these parameters showed group differences for differential diagnosis between PwPD and control speakers at T1 and T2. Results showed that specific acoustic parameters in voice quality, articulation and prosody could differentiate between PwPD and controls, or detect speech changes between T1 and T2, but not both factors. However, specific acoustic parameters within articulation could detect significant group and speech change differences across T1 and T2. The thesis discusses these results, their implications, and the potential for future studies

    Individual differences in L2 listening proficiency revisited: Roles of form, meaning, and use aspects of phonological vocabulary knowledge

    Get PDF
    The present study revisits the differential roles of form, meaning, and use aspects of phonological vocabulary knowledge in L2 listening proficiency. A total of 126 Japanese English-as-a-foreign-language listeners completed the TOEIC Listening test, working memory and auditory processing tests, the Metacognitive Awareness Listening Questionnaire, and several tasks designed to tap into three broad aspects of phonological vocabulary knowledge: (1) the ability to access phonological forms without any orthographic cues (phonologization), (2) the ability to recognize words regardless of the talker (generalization), and (3) the ability to determine the semantic and collocational appropriateness of words in global contexts in a fast and stable manner (automatization). Whereas the perceptual, cognitive, and metacognitive variables made relatively small contributions to L2 listening proficiency (0.4%–21.3%), the vocabulary factors explained a large amount of the variance (77.6%) in the full regression model (R2 = .507). These large lexical effects uniquely derived from the three different aspects of phonological vocabulary knowledge—automatization (55.3%), phonologization (20.8%), and generalization (1.5%). The findings suggest that successful L2 listening skill acquisition draws on not only various levels of phonological form-meaning mapping (phonologization, generalization) but also the spontaneous and robust retrieval of such vocabulary knowledge in relation to surrounding words (automatization)

    An Investigation of Intelligibility and Lingua Franca Core Features in Indonesian Accented English

    Get PDF
    Recent approaches to teaching pronunciation of English in second or foreign language contexts have favoured the role of students’ L1 accents in the teaching and learning process with the emphasis on intelligibility and the use of English as a Lingua Franca rather than on achieving native like pronunciation. As far as English teaching in Indonesia is concerned, there is limited information on the intelligibility of Indonesian Accented English, as well as insufficient guidance on key pronunciation features for effective teaching. This research investigates features of Indonesian Accented English and critically assesses the intelligibility of different levels of Indonesian Accented English.English Speech data were elicited from 50 Indonesian speakers using reading texts. Key phonological features of Indonesian Accented English were investigated through acoustic analysis involving spectrographic observation using Praat Speech Analysis software. The intelligibility of different levels of Indonesian Accented English was measured using a transcription task performed by 24 native and non-native English listeners. The overall intelligibility of each accent was measured by examining the correctness of the transcriptions. The key pronunciation features which caused intelligibility failure were identified by analysing the incorrect transcriptions.The analysis of the key phonological features of Indonesian Accented English showed that while there was some degree of regularity in the production of vowel duration and consonant clusters, more individual variations were observed in segmental features particularly in the production of consonants /v, z, ʃ/ which are absent in the Indonesian phonemic inventory. The results of the intelligibility analysis revealed that although light and moderate accented speech data were significantly more intelligible than the heavier accented speech data, the native and non-native listeners did not have major problems with the intelligibility of Indonesian Accented English across the different accent levels. The analysis of incorrect transcriptions suggested that intelligibility failures were associated more with combined phonological miscues rather than a single factor. These results indicate that while Indonesian Accented English can be used effectively in international communication, it can also inform English language teaching in Indonesia

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

    Automatic Screening of Childhood Speech Sound Disorders and Detection of Associated Pronunciation Errors

    Full text link
    Speech disorders in children can affect their fluency and intelligibility. Delay in their diagnosis and treatment increases the risk of social impairment and learning disabilities. With the significant shortage of Speech and Language Pathologists (SLPs), there is an increasing interest in Computer-Aided Speech Therapy tools with automatic detection and diagnosis capability. However, the scarcity and unreliable annotation of disordered child speech corpora along with the high acoustic variations in the child speech data has impeded the development of reliable automatic detection and diagnosis of childhood speech sound disorders. Therefore, this thesis investigates two types of detection systems that can be achieved with minimum dependency on annotated mispronounced speech data. First, a novel approach that adopts paralinguistic features which represent the prosodic, spectral, and voice quality characteristics of the speech was proposed to perform segment- and subject-level classification of Typically Developing (TD) and Speech Sound Disordered (SSD) child speech using a binary Support Vector Machine (SVM) classifier. As paralinguistic features are both language- and content-independent, they can be extracted from an unannotated speech signal. Second, a novel Mispronunciation Detection and Diagnosis (MDD) approach was introduced to detect the pronunciation errors made due to SSDs and provide low-level diagnostic information that can be used in constructing formative feedback and a detailed diagnostic report. Unlike existing MDD methods where detection and diagnosis are performed at the phoneme level, the proposed method achieved MDD at the speech attribute level, namely the manners and places of articulations. The speech attribute features describe the involved articulators and their interactions when making a speech sound allowing a low-level description of the pronunciation error to be provided. Two novel methods to model speech attributes are further proposed in this thesis, a frame-based (phoneme-alignment) method leveraging the Multi-Task Learning (MTL) criterion and training a separate model for each attribute, and an alignment-free jointly-learnt method based on the Connectionist Temporal Classification (CTC) sequence to sequence criterion. The proposed techniques have been evaluated using standard and publicly accessible adult and child speech corpora, while the MDD method has been validated using L2 speech corpora

    Using Mixed Focus of Attention Principle to Explore a Singing Teacher’s Perceptions of Belting Teaching Guide for Novice Singers

    Get PDF
    Increasing demand for voice lessons in the contemporary commercial music (CCM) genre has been witnessed in recent years, with belting emerging as a prevalent technique amongst voice students and performers. The rising prominence of belting and CCM singing can be attributed to the increasing popularity of televised singing talent shows, musical series and films, and cover versions of popular songs on video sharing platforms. Additionally, classically-trained singers have begun seeking guidance on belting and other CCM techniques to remain relevant to professional voice performance in the changing employment landscape. This study seeks to solve the scarcity of educational resources on CCM belting singing to help voice instructors safely and effectively teach the techniques required for belting. The study focuses on the needs of singing teachers that have little knowledge of the belting technique and those trained in Western classical music. It also debunks the common misconception that belting is harmful to the voice, which leads to the hesitance among singing teachers to explore both CCM and belting as legitimate vocal styles. The study posits that the teaching guide will make vocal teachers more able to coach students interested in belting proper guidance, thus increasing the possibility that vocal teachers will meet the demand for CCM instruction. Because most of the belting resources available to date focus primarily on the physiological, perceptual, and acoustic aspects, this study provides practical guidance for vocal teachers regarding vocal exercises and strategies used in belt voice production, which is currently lacking. Hence, this study developed a comprehensive teaching guide for belting, including practical, evidence-based techniques and exercises. The teaching guide developed in this study is based on the Mixed Focus of Attention Principle (MFA) and presented using straightforward language, along with examples and illustrations to aid comprehension. The teaching guide expresses the credibility of belting as a unique style of performance that requires non-classical vocal techniques. This study uses a single case study approach to explore, through semi-structured interviews and lesson observations, the perceptions of a singing teacher and her three students regarding the developed belting teaching guide and the application of the MFA in belting pedagogy. From the data collected in the study, it is concluded that the belting teaching guide was perceived as valuable by the teacher and the students. These conclusions make a new contribution to knowledge in voice pedagogy. They show that the developed belting teaching guide and the MFA are vastly beneficial to voice instructors seeking practical guidance to teach the belting style

    Exploring the effects of accent on cognitive processes: behavioral and electrophysiological insights

    Get PDF
    167 p.Previous research has found that speaker accent can have an impact on a range of offline and online cognitive processes (Baus, Bas, Calabria, & Costa, 2017; McAleer, Todorov, & Belin, 2014; Stevenage, Clarke, & McNeill, 2012; Sporer, 2001). Indeed, previous studies show that there are differences in native and non-native speech processing (Lev-Ari, 2018). Processing foreign-accented speech requires the listener to adapt to an extra range of variability, suggesting that there may be an increase in the amount of attentional and cognitive resources that are needed to successfully interpret the speech signal of a foreign-accented speaker. However, less is known about the differences between processing native and dialectal accents. Is dialectal processing more similar to foreign or native speech? To address this, two theories have been proposed (Clarke & Garrett, 2004; Floccia et al, 2009). Previous studies have contributed to the plausibility of both hypotheses and importantly for the purposes of this project, previous electroencephalography experiments exploring the question have mainly used sentences as material. More studies are needed to elucidate whether foreign accent is processed uniquely from all types of native speech (both native and dialectal accents) or whether dialectal accent is treated differently from native accent, despite both being native speech variations. Accordingly, the central aim of this dissertation is to further investigate processing mechanisms of speech accent across different levels of linguistic analysis using evidence from both behavioral and electrophysiological experiments. An additional aim of this project was to look at the effects of accent on information retention. In addition to fluctuations in attentional demands, it seems that non-native accent can lead to differences in the depth of listeners¿ memory encoding (Atkinson et al., 2005). This project further aimed to study how changing the accent of the information delivered may affect how well people remember the information received. Three experiments were carried out to investigate accent processing, results and future directions are discussed
    corecore