2,485 research outputs found

    Applications of Text Analysis Tools for Spoken Response Grading

    Get PDF

    Shallow Analysis Based Assessment of Syntactic Complexity for Automated Speech Scoring

    Get PDF
    Abstract Designing measures that capture various aspects of language ability is a central task in the design of systems for automatic scoring of spontaneous speech. In this study, we address a key aspect of language proficiency assessment -syntactic complexity. We propose a novel measure of syntactic complexity for spontaneous speech that shows optimum empirical performance on real world data in multiple ways. First, it is both robust and reliable, producing automatic scores that agree well with human rating compared to the stateof-the-art. Second, the measure makes sense theoretically, both from algorithmic and native language acquisition points of view

    Using Ontology-Based Approaches to Representing Speech Transcripts for Automated Speech Scoring

    Get PDF
    Text representation is a process of transforming text into some formats that computer systems can use for subsequent information-related tasks such as text classification. Representing text faces two main challenges: meaningfulness of representation and unknown terms. Research has shown evidence that these challenges can be resolved by using the rich semantics in ontologies. This study aims to address these challenges by using ontology-based representation and unknown term reasoning approaches in the context of content scoring of speech, which is a less explored area compared to some common ones such as categorizing text corpus (e.g. 20 newsgroups and Reuters). From the perspective of language assessment, the increasing amount of language learners taking second language tests makes automatic scoring an attractive alternative to human scoring for delivering rapid and objective scores of written and spoken test responses. This study focuses on the speaking section of second language tests and investigates ontology-based approaches to speech scoring. Most previous automated speech scoring systems for spontaneous responses of test takers assess speech by primarily using acoustic features such as fluency and pronunciation, while text features are less involved and exploited. As content is an integral part of speech, the study is motivated by the lack of rich text features in speech scoring and is designed to examine the effects of different text features on scoring performance. A central question to the study is how speech transcript content can be represented in an appropriate means for speech scoring. Previously used approaches from essay and speech scoring systems include bag-of-words and latent semantic analysis representations, which are adopted as baselines in this study; the experimental approaches are ontology-based, which can help improving meaningfulness of representation units and estimating importance of unknown terms. Two general domain ontologies, WordNet and Wikipedia, are used respectively for ontology-based representations. In addition to comparison between representation approaches, the author analyzes which parameter option leads to the best performance within a particular representation. The experimental results show that on average, ontology-based representations slightly enhances speech scoring performance on all measurements when combined with the bag-of-words representation; reasoning of unknown terms can increase performance on one measurement (cos.w4) but decrease others. Due to the small data size, the significance test (t-test) shows that the enhancement of ontology-based representations is inconclusive. The contributions of the study include: 1) it examines the effects of different representation approaches on speech scoring tasks; 2) it enhances the understanding of the mechanisms of representation approaches and their parameter options via in-depth analysis; 3) the representation methodology and framework can be applied to other tasks such as automatic essay scoring

    New and not so new methods for assessing oral communication

    Get PDF
    The assessment of oral communication has continued to evolve over the past few decades. The construct being assessed has broadened to include interactional competence, and technology has played a role in the types of tasks that are currently popular. In this paper, we discuss the factors that affect the process of oral communication assessment, current conceptualizations of the construct to be assessed, and five tasks that are used to assess this construct. These tasks include oral proficiency interviews, paired/group oral discussion tasks, simulated tasks, integrated oral communication tasks, and elicited imitation tasks. We evaluate these tasks based on current conceptualizations of the construct of oral communication, and conclude that they do not assess a broad construct of oral communication equally. Based on our evaluation, we advise test developers to consider the aspects of oral communication that they aim to include or exclude in their assessment when they select one of these task types

    To What Extent is Collocation Knowledge Associated with Oral Proficiency? A Corpus-Based Approach to Word Association

    Get PDF
    This study examined the relationship between second language (L2) learners’ collocation knowledge and oral proficiency. A new approach to measuring collocation was adopted by eliciting responses through a word association task and using corpus-based measures (absolute frequency count, t-score, MI score) to analyze the degree to which stimulus words and responses were collocated. Oral proficiency was measured using human judgements and objective measures of fluency (articulation rate, silent pause ratio, filled pause ratio) and lexical richness (diversity, frequency, range). Forty Japanese university students completed a word association task and a spontaneous speaking task (picture narrative). Results indicated that speakers who used more low-frequency collocations in the word association task (i.e., lower collocation frequency scores) spoke faster with fewer silent pauses and were perceived to be more fluent. Speakers who provided more strongly associated collocations (as measured by MI) used more sophisticated lexical items and were perceived to be lexically proficient. Collocation knowledge remained as a unique predictor after the influence of learners’ vocabulary size (i.e., knowledge of single-word items) was considered. These findings support the key role that collocation plays in oral proficiency and provide important insights into understanding L2 speech development from the perspective of phraseological competence

    Factors Affecting Grammatical and Lexical Complexity of Long-Term L2 Speakers’ Oral Proficiency

    Get PDF
    There remains considerable disagreement about which factors drive second language (L2) ultimate attainment. Age of onset (AO) appears to be a robust factor, lending support to theories of maturational constraints on L2 acquisition. The present study is an investigation of factors that influence grammatical and lexical complexity at the stage of L2 ultimate attainment. Grammatical and lexical complexity were assessed in 102 spontaneous oral interviews. Interviewees' AOs ranged from 7 to 17 years old. Multifactorial analyses yielded consistently significant effects of gender and level of education for grammatical and lexical complexity. Additionally, native language use at work was a significant predictor for lexical complexity; conversely, AO did not emerge as a significant factor. We conclude that grammatical and lexical complexity at the stage of L2 ultimate attainment is the result of a complex interplay of variables that are general to language learning and performance rather than L2 specific

    Foneettinen sujuvuus suomessa toisena kielenÀ: Lukiolaisten spontaanin puheen akustinen analyysi

    Get PDF
    Speaking fluently is an important goal for second language (L2) learners. In L2 research, fluency is often studied by measuring temporal features in speech. These features include speed (rate of speech), breakdown (use of silent and filled pauses), and repair (self-corrections and repetitions) phenomena. Fluent speakers generally have a higher rate of speech and fewer hesitations and interruptions than beginner language learners. In this thesis, phonetic fluency of high school students’ L2 Finnish speech is studied in relation to human ratings of fluency and overall proficiency. The topic is essential for the development of automated assessment of L2 speech, as phonetic fluency measures can be used for predicting a speaker’s fluency and proficiency level automatically. Although the effect of different fluency measures on perceived fluency level has been widely studied during the last decades, research on phonetic fluency in Finnish as L2 is still limited. Phonetic fluency in high school students’ speech in L2 Finnish has not been studied before. The speech samples and ratings used in this thesis are a part of a larger dataset collected in the DigiTala research project. The analyzed data contained spontaneous speech samples in L2 Finnish from 53 high school students of different language backgrounds. All samples were assessed by expert raters for fluency and overall proficiency. The speech samples were annotated by marking intervals containing silent pauses, filled pauses, corrections and repetitions, and individual words. Several phonetic fluency measures were calculated for each sample from the durations of the annotated intervals. The contribution of phonetic fluency measures to human ratings of fluency and proficiency was studied using simple and multiple linear regression models. Speech rate was found to be the strongest predictor for both fluency and proficiency ratings in simple linear regression. Articulation rate, portion of long silent pauses, mean duration of long silent pauses, mean duration of breaks between utterances, and rate of short silent pauses per minute were also statistically significant predictors of both fluency and proficiency ratings. Multiple linear regression models improved the simple models for both fluency and proficiency: for fluency, a model with a combination of articulation rate and the portion of long silent pauses performed the best, and for proficiency, a model with a combination of speech rate and mean duration of short silent pauses. Perceived fluency level is often affected by a combination of different phonetic fluency measures, and it seems that human raters ground their assessments on this combination, although some phonetic fluency measures might be more important on their own than others. The findings of this thesis expand previous knowledge on phonetic fluency in L2 Finnish and can benefit both language learners and teachers, as well as developers of automatic assessment of L2 speech.Sujuvaa puhetaitoa pidetÀÀn tĂ€rkeĂ€nĂ€ tavoitteena toisen kielen (L2) oppimisessa. L2-puheen tutkimuksissa sujuvuutta tutkitaan usein puheesta mitattavilla temporaalisilla piirteillĂ€, joita ovat esimerkiksi puheen nopeus, tauot, korjaukset ja toistot. Nopea, vĂ€hĂ€n epĂ€röintiĂ€ ja keskeytyksiĂ€ sisĂ€ltĂ€vĂ€ puhe mielletÀÀn usein sujuvaksi, ja toisen kielen oppimisen alkuvaiheessa puhe on epĂ€sujuvampaa. TĂ€ssĂ€ tutkielmassa tutkitaan lukiolaisten L2-suomen foneettista sujuvuutta puheesta mitattavien foneettisten sujuvuuspiirteiden sekĂ€ sujuvuus- ja taitotasoarvioiden avulla. Tutkimusaihe liittyy myös puheen automaattisen arvioinnin kehittĂ€miseen, sillĂ€ kielenoppijan sujuvuus- ja taitotasoa voidaan ennustaa automaattisesti foneettisten sujuvuuspiirteiden avulla. Vaikka sujuvuuspiirteiden ja arviointien vĂ€listĂ€ yhteyttĂ€ on tutkittu melko paljon viime vuosikymmeninĂ€, L2-suomen foneettiseen sujuvuuteen liittyviĂ€ tutkimuksia on yhĂ€ vĂ€hĂ€n. Lukiolaisten L2-suomen foneettista sujuvuutta ei ole aiemmin tutkittu. Tutkielmassa kĂ€ytetty puhe- ja arviointiaineisto on osa suurempaa aineistoa, joka on kerĂ€tty DigiTala-tutkimusprojektissa. Analysoitu aineisto sisĂ€lsi 53 spontaania puhenĂ€ytettĂ€ lukiolaisilta, jotka puhuvat suomea toisena kielenĂ€. LisĂ€ksi jokaisen puhenĂ€ytteen sujuvuus ja yleinen taitotaso oli arvioitu. PuhenĂ€ytteisiin annotoitiin hiljaiset ja tĂ€ytetyt tauot, korjaukset ja toistot sekĂ€ yksittĂ€iset sanat. Annotoitujen intervallien kestoista laskettiin useita foneettisia sujuvuuspiirteitĂ€ jokaiselle puhenĂ€ytteelle. Foneettisten sujuvuuspiirteiden vaikutusta ihmisarvioihin tutkittiin lineaaristen regressiomallien avulla. Puhenopeus ennusti yhden selittĂ€vĂ€n muuttujan malleissa sekĂ€ sujuvuus- ettĂ€ taitotasoarvioita parhaiten. TĂ€mĂ€n lisĂ€ksi artikulaationopeus, pitkien hiljaisten taukojen osuus, pitkien hiljaisten taukojen keskimÀÀrĂ€inen kesto, yhtenĂ€isten puhejaksojen vĂ€listen keskeytysten keskimÀÀrĂ€inen kesto ja lyhyiden hiljaisten taukojen suhteellinen lukumÀÀrĂ€ olivat tilastollisesti merkitseviĂ€ ennustajia yhden selittĂ€vĂ€n muuttujan malleissa. Useamman selittĂ€vĂ€n muuttujan mallit paransivat aiempien mallien selitysvoimaa sekĂ€ sujuvuus- ettĂ€ taitotasoarvioissa: artikulaationopeuden ja pitkien hiljaisten taukojen osuuden yhdistelmĂ€ ennusti sujuvuusarvioita parhaiten, ja puhenopeuden ja lyhyiden hiljaisten taukojen keskimÀÀrĂ€isen keston yhdistelmĂ€ taitotasoarvioita. Puheen havaittuun sujuvuuteen vaikuttaa usein yhdistelmĂ€ erilaisia sujuvuuspiirteitĂ€, vaikka yksittĂ€isten piirteiden vaikutukset voivat olla keskenÀÀn erilaisia. Tutkielman tulokset lisÀÀvĂ€t tietoa L2-suomen foneettisesta sujuvuudesta, ja ne ovat tarpeellisia niin kielenoppijoille, -opettajille kuin puheen automaattisten arviointityökalujen kehittĂ€jille

    The relationship between task difficulty and second language fluency in French:a mixed-methods approach

    Get PDF
    While there exists a considerable body of literature on task-based difficulty and second language (L2) fluency in English as a second language (ESL), there has been little investigation with French learners. This mixed-methods study examines learner appraisals of task difficulty and their relationship to automated utterance fluency measures in French under three different task conditions. Participants were 40 adult learners of French at varying levels of proficiency studying in a university immersion context in QuĂ©bec. Appraisal of task difficulty was assessed quantitatively by participants’ self-reports in response to a five-item questionnaire and qualitatively by retrospective interviews. Utterance fluency was operationalized by four temporal variables and measured by Praat, a speech analysis software program. Across tasks, the quantitative results indicate that appraisals of lexical retrieval difficulty and fluency difficulty were most strongly related to perceived overall task difficulty. The qualitative analysis shows how L2 speakers evaluated the difficulty of each task as well as the features that either contributed to or limited their L2 fluency. Students’ fluency in performing the three tasks was found to differ for articulation rate and average pause time, but not for pause frequency or phonation-time ratio
