6 research outputs found
Foneettinen sujuvuus suomessa toisena kielenÀ: Lukiolaisten spontaanin puheen akustinen analyysi
Speaking fluently is an important goal for second language (L2) learners. In L2 research, fluency is often studied by measuring temporal features in speech. These features include speed (rate of speech), breakdown (use of silent and filled pauses), and repair (self-corrections and repetitions) phenomena. Fluent speakers generally have a higher rate of speech and fewer hesitations and interruptions than beginner language learners. In this thesis, phonetic fluency of high school studentsâ L2 Finnish speech is studied in relation to human ratings of fluency and overall proficiency. The topic is essential for the development of automated assessment of L2 speech, as phonetic fluency measures can be used for predicting a speakerâs fluency and proficiency level automatically. Although the effect of different fluency measures on perceived fluency level has been widely studied during the last decades, research on phonetic fluency in Finnish as L2 is still limited. Phonetic fluency in high school studentsâ speech in L2 Finnish has not been studied before.
The speech samples and ratings used in this thesis are a part of a larger dataset collected in the DigiTala research project. The analyzed data contained spontaneous speech samples in L2 Finnish from 53 high school students of different language backgrounds. All samples were assessed by expert raters for fluency and overall proficiency. The speech samples were annotated by marking intervals containing silent pauses, filled pauses, corrections and repetitions, and individual words. Several phonetic fluency measures were calculated for each sample from the durations of the annotated intervals.
The contribution of phonetic fluency measures to human ratings of fluency and proficiency was studied using simple and multiple linear regression models. Speech rate was found to be the strongest predictor for both fluency and proficiency ratings in simple linear regression. Articulation rate, portion of long silent pauses, mean duration of long silent pauses, mean duration of breaks between utterances, and rate of short silent pauses per minute were also statistically significant predictors of both fluency and proficiency ratings. Multiple linear regression models improved the simple models for both fluency and proficiency: for fluency, a model with a combination of articulation rate and the portion of long silent pauses performed the best, and for proficiency, a model with a combination of speech rate and mean duration of short silent pauses.
Perceived fluency level is often affected by a combination of different phonetic fluency measures, and it seems that human raters ground their assessments on this combination, although some phonetic fluency measures might be more important on their own than others. The findings of this thesis expand previous knowledge on phonetic fluency in L2 Finnish and can benefit both language learners and teachers, as well as developers of automatic assessment of L2 speech.Sujuvaa puhetaitoa pidetÀÀn tÀrkeÀnÀ tavoitteena toisen kielen (L2) oppimisessa. L2-puheen tutkimuksissa sujuvuutta tutkitaan usein puheesta mitattavilla temporaalisilla piirteillÀ, joita ovat esimerkiksi puheen nopeus, tauot, korjaukset ja toistot. Nopea, vÀhÀn epÀröintiÀ ja keskeytyksiÀ sisÀltÀvÀ puhe mielletÀÀn usein sujuvaksi, ja toisen kielen oppimisen alkuvaiheessa puhe on epÀsujuvampaa. TÀssÀ tutkielmassa tutkitaan lukiolaisten L2-suomen foneettista sujuvuutta puheesta mitattavien foneettisten sujuvuuspiirteiden sekÀ sujuvuus- ja taitotasoarvioiden avulla. Tutkimusaihe liittyy myös puheen automaattisen arvioinnin kehittÀmiseen, sillÀ kielenoppijan sujuvuus- ja taitotasoa voidaan ennustaa automaattisesti foneettisten sujuvuuspiirteiden avulla. Vaikka sujuvuuspiirteiden ja arviointien vÀlistÀ yhteyttÀ on tutkittu melko paljon viime vuosikymmeninÀ, L2-suomen foneettiseen sujuvuuteen liittyviÀ tutkimuksia on yhÀ vÀhÀn. Lukiolaisten L2-suomen foneettista sujuvuutta ei ole aiemmin tutkittu.
Tutkielmassa kÀytetty puhe- ja arviointiaineisto on osa suurempaa aineistoa, joka on kerÀtty DigiTala-tutkimusprojektissa. Analysoitu aineisto sisÀlsi 53 spontaania puhenÀytettÀ lukiolaisilta, jotka puhuvat suomea toisena kielenÀ. LisÀksi jokaisen puhenÀytteen sujuvuus ja yleinen taitotaso oli arvioitu. PuhenÀytteisiin annotoitiin hiljaiset ja tÀytetyt tauot, korjaukset ja toistot sekÀ yksittÀiset sanat. Annotoitujen intervallien kestoista laskettiin useita foneettisia sujuvuuspiirteitÀ jokaiselle puhenÀytteelle.
Foneettisten sujuvuuspiirteiden vaikutusta ihmisarvioihin tutkittiin lineaaristen regressiomallien avulla. Puhenopeus ennusti yhden selittÀvÀn muuttujan malleissa sekÀ sujuvuus- ettÀ taitotasoarvioita parhaiten. TÀmÀn lisÀksi artikulaationopeus, pitkien hiljaisten taukojen osuus, pitkien hiljaisten taukojen keskimÀÀrÀinen kesto, yhtenÀisten puhejaksojen vÀlisten keskeytysten keskimÀÀrÀinen kesto ja lyhyiden hiljaisten taukojen suhteellinen lukumÀÀrÀ olivat tilastollisesti merkitseviÀ ennustajia yhden selittÀvÀn muuttujan malleissa. Useamman selittÀvÀn muuttujan mallit paransivat aiempien mallien selitysvoimaa sekÀ sujuvuus- ettÀ taitotasoarvioissa: artikulaationopeuden ja pitkien hiljaisten taukojen osuuden yhdistelmÀ ennusti sujuvuusarvioita parhaiten, ja puhenopeuden ja lyhyiden hiljaisten taukojen keskimÀÀrÀisen keston yhdistelmÀ taitotasoarvioita.
Puheen havaittuun sujuvuuteen vaikuttaa usein yhdistelmÀ erilaisia sujuvuuspiirteitÀ, vaikka yksittÀisten piirteiden vaikutukset voivat olla keskenÀÀn erilaisia. Tutkielman tulokset lisÀÀvÀt tietoa L2-suomen foneettisesta sujuvuudesta, ja ne ovat tarpeellisia niin kielenoppijoille, -opettajille kuin puheen automaattisten arviointityökalujen kehittÀjille
Fluency-related Temporal Features and Syllable Prominence as Prosodic Proficiency Predictors for Learners of English with Different Language Backgrounds
Prosodic features are important in achieving intelligibility, comprehensibility, and fluency in a second or foreign language (L2). However, research on the assessment of prosody as part of oral proficiency remains scarce. Moreover, the acoustic analysis of L2 prosody has often focused on fluency-related temporal measures, neglecting language-dependent stress features that can be quantified in terms of syllable prominence. Introducing the evaluation of prominence-related measures can be of use in developing both teaching and assessment of L2 speaking skills. In this study we compare temporal measures and syllable prominence estimates as predictors of prosodic proficiency in non-native speakers of English with respect to the speaker's native language (L1). The predictive power of temporal and prominence measures was evaluated for utterance-sized samples produced by language learners from four different L1 backgrounds: Czech, Slovak, Polish, and Hungarian. Firstly, the speech samples were assessed using the revised Common European Framework of Reference scale for prosodic features. The assessed speech samples were then analyzed to derive articulation rate and three fluency measures. Syllable-level prominence was estimated by a continuous wavelet transform analysis using combinations of F0, energy, and syllable duration. The results show that the temporal measures serve as reliable predictors of prosodic proficiency in the L2, with prominence measures providing a small but significant improvement to prosodic proficiency predictions. The predictive power of the individual measures varies both quantitatively and qualitatively depending on the L1 of the speaker. We conclude that the possible effects of the speaker's L1 on the production of L2 prosody in terms of temporal features as well as syllable prominence deserve more attention in applied research and developing teaching and assessment methods for spoken L2.Peer reviewe
Reading Development in Adolescent First and Second Language English Learners: A Comparison Using Age Match Design
Fourteen Iranian-Canadian bilingual students were tested for language ability as well as cognitive and phonological processing skills in two languages: Farsi and English. They were compared to 30 Iranian monolingual chronological age matched students and 30 Canadian chronological age matched peers. Since there were not any standardized tests in Farsi, one of the aims of this study was to begin creating the language ability measures in Farsi, and to test their reliabilities. In general, from six developed and translated Farsi tasks, three of them were found to be reliable. It was found that bilingual students perform better on memory tasks, compared to two other monolingual groups. There were not any group differences on English measures of reading comprehension and word reading among Iranian bilingual students and their English age matched peers. Additionally, the results of this study showed that Iranian bilinguals performed better on the measure of receptive vocabulary, knowing more English words in comparison to Canadian monolinguals. This finding could be explained by the higher socio-economic status and greater number of English books that Iranian bilinguals have. The final key finding is that Iranian bilinguals perform more poorly on Farsi tasks, and better on English measures compared to Iranian monolinguals
Automatic Proficiency Evaluation of Spoken English by Japanese Learners for Dialogue-Based Language Learning System Based on Deep Learning
Tohoku UniversityäŒè€ćœ°ćèȘČ
The development of automatic speech evaluation system for learners of English
ć¶ćșŠ:æ° ; ć ±ćçȘć·:çČ3183ć· ; ćŠäœăźçšźéĄ:ć棫(æèČćŠ) ; æäžćčŽææ„:2010/11/30 ; æ©ć€§ćŠäœèšçȘć·:æ°547
Automatic Screening of Childhood Speech Sound Disorders and Detection of Associated Pronunciation Errors
Speech disorders in children can affect their fluency and intelligibility. Delay in their diagnosis and treatment increases the risk of social impairment and learning disabilities. With the significant shortage of Speech and Language Pathologists (SLPs), there is an increasing interest in Computer-Aided Speech Therapy tools with automatic detection and diagnosis capability.
However, the scarcity and unreliable annotation of disordered child speech corpora along with the high acoustic variations in the child speech data has impeded the development of reliable automatic detection and diagnosis of childhood speech sound disorders. Therefore, this thesis investigates two types of detection systems that can be achieved with minimum dependency on annotated mispronounced speech data.
First, a novel approach that adopts paralinguistic features which represent the prosodic, spectral, and voice quality characteristics of the speech was proposed to perform segment- and subject-level classification of Typically Developing (TD) and Speech Sound Disordered (SSD) child speech using a binary Support Vector Machine (SVM) classifier. As paralinguistic features are both language- and content-independent, they can be extracted from an unannotated speech signal.
Second, a novel Mispronunciation Detection and Diagnosis (MDD) approach was introduced to detect the pronunciation errors made due to SSDs and provide low-level diagnostic information that can be used in constructing formative feedback and a detailed diagnostic report. Unlike existing MDD methods where detection and diagnosis are performed at the phoneme level, the proposed method achieved MDD at the speech attribute level, namely the manners and places of articulations. The speech attribute features describe the involved articulators and their interactions when making a speech sound allowing a low-level description of the pronunciation error to be provided. Two novel methods to model speech attributes are further proposed in this thesis, a frame-based (phoneme-alignment) method leveraging the Multi-Task Learning (MTL) criterion and training a separate model for each attribute, and an alignment-free jointly-learnt method based on the Connectionist Temporal Classification (CTC) sequence to sequence criterion.
The proposed techniques have been evaluated using standard and publicly accessible adult and child speech corpora, while the MDD method has been validated using L2 speech corpora