268 research outputs found
The Effect of Speech Elicitation Method on Second Language Phonemic Accuracy
The present study, a One-Group Posttest-Only Repeated-Measures Design, examined the effect of speech elicitation method on second language (L2) phonemic accuracy of high functional load initial phonemes found in frequently occurring nouns in American English. This effect was further analyzed by including the variable of first language (L1) to determine if L1 moderated any effects found. The data consisted of audio recordings of 61 adult English learners (ELs) enrolled in English for Academic Purposes (EAP) courses at a large, public, post-secondary institution in the United States. Phonemic accuracy was judged by two independent raters as either approximating a standard American English (SAE) pronunciation of the intended phoneme or not, thus a dichotomous scale, and scores were assigned to each participant in terms of the three speech elicitation methods of word reading, word repetition, and picture naming. Results from a repeated measures ANOVA test revealed a statistically significant difference in phonemic accuracy (F(1.47, 87.93) = 25.94, p = .000) based on speech elicitation method, while the two-factor mixed design ANOVA test indicated no statistically significant differences for the moderator variable of native language. However, post-hoc analyses revealed that mean scores of picture naming tasks differed significantly from the other two elicitation methods of word reading and word repetition. Moreover, the results of this study should heighten attention to the role that various speech elicitation methods, or input modalities, might play on L2 productive accuracy. Implications for practical application suggest that caution should be used when utilizing pictures to elicit specific vocabulary wordsâeven high-frequency wordsâas they might result in erroneous productions or no utterance at all. These methods could inform pronunciation instructors about best teaching practices when pronunciation accuracy is the objective. Finally, the impact of L1 on L2 pronunciation accuracy might not be as important as once thought
Voice onset time of Mankiyali language: an acoustic analysis
The endangered Indo-Aryan language Mankiyali, spoken in northern Pakistan, lacks linguistic
documentation and necessitates research. This study explores the Voice Onset Time (VOT) values
of Mankiyali's stop consonants to determine the duration of sound release, characterized as
negative, positive, and zero VOTs. The investigation aims to identify the laryngeal categories
present in the language. Using a mixed methods approach, data were collected from five native
male speakers via the Zoom H6 platform. The study employed the theoretical framework of Fant's
(1970) source filter model and analyzed each phoneme using PRAAT software. Twenty-five
tokens of a single phoneme were recorded across the five speakers. The results reveal that
Mankiyali encompasses three laryngeal categories: voiceless unaspirated (VLUA) stops, voiceless
aspirated (VLA) stops, and voiced unaspirated (VDUA) stops. The study highlights significant
differences in VOTs based on place of articulation and phonation. In terms of phonation, the
VLUA bilabial stop /p/, alveolar stop /t/, and velar stop /k/ exhibit shorter voicing lag compared
to their VLA counterparts /pĘ°, tĘ°, kĘ°/. All VLUA and VLA stops display +VOT values, while all
VDUA stops exhibit -VOT values. Regarding place of articulation, the bilabial /p/ demonstrates a
longer voicing lag than the alveolar /t/ but a shorter lag than the velar /k/. Additionally, the results
indicate similarities in voicing lag among the VDUA stops /b, d, Ö/. This study offers valuable
insights into the phonetic and phonological aspects of Mankiyali and holds potential significance
for the language's preservation
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Automatic speech recognition (ASR) has recently become an important challenge
when using deep learning (DL). It requires large-scale training datasets and
high computational and storage resources. Moreover, DL techniques and machine
learning (ML) approaches in general, hypothesize that training and testing data
come from the same domain, with the same input feature space and data
distribution characteristics. This assumption, however, is not applicable in
some real-world artificial intelligence (AI) applications. Moreover, there are
situations where gathering real data is challenging, expensive, or rarely
occurring, which can not meet the data requirements of DL models. deep transfer
learning (DTL) has been introduced to overcome these issues, which helps
develop high-performing models using real datasets that are small or slightly
different but related to the training data. This paper presents a comprehensive
survey of DTL-based ASR frameworks to shed light on the latest developments and
helps academics and professionals understand current challenges. Specifically,
after presenting the DTL background, a well-designed taxonomy is adopted to
inform the state-of-the-art. A critical analysis is then conducted to identify
the limitations and advantages of each framework. Moving on, a comparative
study is introduced to highlight the current challenges before deriving
opportunities for future research
Arabic Fluency Assessment: Procedures for Assessing Stuttering in Arabic Preschool Children
The primary aim of this thesis was to screen school-aged (4+) children for two separate types of fluency issues and to distinguish both groups from fluent children. The two fluency issues are Word-Finding Difficulty (WFD) and other speech disfluencies (primarily stuttering). The cohort examined consisted of children who spoke Arabic and English. We first designed a phonological assessment procedure that can equitably test Arabic and English children, called the Arabic English non-word repetition task (AEN_NWR). Rileyâs Stuttering Severity Instrument (SSI) is the standard way of assessing fluency for speakers of English. There is no standardized version of SSI for Arabic speakers. Hence, we designed a scheme to measure disfluency symptoms in Arabic speech (Arabic fluency assessment). The scheme recognizes that Arabic and English differ at all language levels (lexically, phonologically and syntactically).
After the children with WFD had been separated from those with stuttering, our second aim was to develop and deliver appropriate interventions for the different cohorts. Specifically, we aimed to develop treatments for the children with WFD using short procedures that are suitable for conducting in schools. Children who stutter are referred to SLTs to receive the appropriate type of intervention. To treat WFD, another set of non-word materials was designed to include phonemic patterns not used in the speakerâs native language that are required if that speaker uses another targeted language (e.g. phonemic patterns that occur in English, but not Arabic). The goal was to use these materials in an intervention to train phonemic sequences that are not used in the childâs additional language such as the phonemic patterns that occur in English, but not Arabic. The hypothesis is that a native Arabic speaker learning English would be expected to struggle on those phonotactic patterns not used in Arabic that are required for English.
In addition to the screening and intervention protocols designed, self-report procedures are desirable to assess speech fluency when time for testing is limited. To that end, the last chapter discussed the importance of designing a fluency questionnaire that can assess fluency in the entire population of speakers. Together with the AEN_NWR, the brief self-report instrument forms a package of assessment procedures that facilitate screening of speech disfluencies in Arabic children (aged 4+) when they first enter school. The seven chapters, described in more detail below, together constitute a package that achieves the aims of identifying speech problems in children using Arabic and/or English and offering intervention to treat WFD
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
Voice Into Text: Case Studies in the History of Linguistic Transcription
As a contribution to the field of linguistic historiography (Swiggers, 2010), this thesis offers a detailed narrative of the âmental worldsâ of writers tackling the task of transcribing languages both before the appearance of the International Phonetic Alphabet in 1888 and at a time when the IPA was emerging as the agreed standard for phonetic transcription. The narrative includes an account of how the cultural, historical and political background in which these writers operated, ultimately shaped their linguistic transcriptions. I argued that this approach, which also included observations drawn from fields other than linguistics, helped to provide a far richer illustration of their mental worlds, and that its omission would have rendered my analysis seriously deficient. This work has also demonstrated that the writersâ own linguistic training could also hinder, rather than aid, the transcription process. It has also therefore focused on how the authors mediated the tension between their pre-existing linguistic knowledge and the reality of the data they had to analyse. It has been argued that success in this context also resulted in a successful transcription.
The two corpora presented in this thesis are the Mohawk religious corpus held at the British Library, and the phonetic transcriptions of the British recordings included in the Berliner Lautarchiv, also at the British Library. Their peculiar characteristics, the challenges they posed to the transcribers, and the factors that led to their creation are discussed at length. With regards to the Mohawk corpus, the analysis has focused on the comparison of the notations of Mohawk by writers belonging to the French tradition and those by English-, German-, or Dutch-speaking authors. The analysis of the Berliner Lautarchiv corpus has instead focused on the phonetic transcriptions created by Alois Brandl, an Austrian Anglicist who was also a student of Henry Sweet
Investigating the perception and production of the Arabic pharyngealised sounds by L2 learners of Arabic
Pronunciation has received relatively little attention within the field of Arabic
second language teaching and learning, particularly with respect to the more prominent
areas of morphology, syntax, psycholinguistics and sociolinguistics. In the field of
phonetics and phonology, it has been argued that Arabic pharyngealised sounds are
distinctive and unique to Arabic and they are considered the most difficult sounds to
acquire by L2 learners of Arabic. This research included two experiments that focused
on examining the ability of a group of Arabic L2 learners from different L1
backgrounds to perceive and produce the fricative sounds /z/, /θ/, /f/, /Ę/, /ħ/, /h/, /Ď/, /ÉŁ/,
/Ę/, /sˤ/, /ðˤ/, /s/, /Ă°/, and the emphatic sounds /sˤ/, /ðˤ/, /dˤ/, and /tˤ/ in contrast with nonpharyngealised variants /s/, /Ă°/, /d/ and /t/. The aims were to investigate which aspects
of acquisition were difficult and to examine the effects of technology-based instruction
and traditional-based instruction to find an appropriate pronunciation teaching method
to facilitate the perception and production of fricatives and emphatics.
The technology-based method used in this study was adapted from Olson (2014)
and Offerman and Olson (2016) to investigate the extent to which using speech analysis
technology (Praat) can help in visualising the difference between pharyngealised and
non-pharyngealised sounds in order to aid production and perception learning. The
traditional-based method used in this study included repetition, practicing minimal
pairs, and reading aloud techniques. Data were collected from forced-choice
identification tasks and recordings taken during pre- and post-test conditions.
The results revealed that the some of the fricatives and all the emphatic sounds
posed perception and production difficulty to some L2 learners of Arabic, which is
likely to be due to the absence of these sounds from the learnersâ L1s. The results also showed significant improvements among all participants after the traditional and
technology training courses. However, no significant difference was observed between
L2 learners who received the traditional-based method and those who received the
technology-based method. Both methods have increased studentsâ awareness and
understanding of the features of the sounds under investigation.
The contribution of the current study is to show how Arabic fricative and
emphatic sounds can be effectively taught using form-focused instruction involving
different traditional and technological techniques. This research has implications for the
implementation of both techniques for language teachers and researchers as it shows
how both approaches can be used to enhance studentsâ perceptive and productive skills
Distinguishing a phonological encoding disorder from Apraxia of Speech in individuals with aphasia by using EEG
As we speak, various processes take place in our brains. We find the word, find and organize the speech sounds and program the movements for speech. A stroke may cause impairment at any of these processes. Usually, multiple processes are affected. Existing methods to distinguish a disorder in finding and organizing speech sounds (phonological encoding) from an impairment in programming the articulation (Apraxia of Speech) are not optimal. In this thesis, it was studied whether EEG, measuring small changes in electric brain activity with electrodes that are placed on the scalp, can be used for this purpose. A protocol was developed to trace the processes of speech production, which was successfully tested in a group of younger and one of older neurologically healthy adults. In the younger and older adults, the processes were registered at the same electrodes on the scalp, but the time window and the waveform of the processes differed. In individuals with a phonological encoding disorder and those with Apraxia of Speech the disordered processes could not be identified, because the severity of the impairment in the groups varied. Their impaired processes differed from those in neurologically healthy individuals. Also, because of their disorder in the previous stage, the programming of the articulation was different in individuals with a phonological encoding disorder. The protocol can distinguish a phonological encoding disorder from Apraxia of Speech due to differences in the EEG data (relative to neurologically healthy participants) that only were observed during programming movements for speech
- âŚ