474 research outputs found

    Autenttisiin teksteihin perustuva tietokoneavusteinen kielen oppiminen: sovelluksia italian kielelle

    Get PDF
    Computer-Assisted Language Learning (CALL) is one of the sub-disciplines within the area of Second Language Acquisition. Clozes, also called fill-in-the-blank, are largely used exercises in language learning applications. A cloze is an exercise where the learner is asked to provide a fragment that has been removed from the text. For language learning purposes, in addition to open-end clozes where one or more words are removed and the student must fill the gap, another type of cloze is commonly used, namely multiple-choice cloze. In a multiple-choice cloze, a fragment is removed from the text and the student must choose the correct answer from multiple options. Multiple-choice exercises are a common way of practicing and testing grammatical knowledge. The aim of this work is to identify relevant learning constructs for Italian to be applied to automatic exercises creation based on authentic texts in the Revita Framework. Learning constructs are units that represent language knowledge. Revita is a free to use online platform that was designed to provide language learning tools with the aim of revitalizing endangered languages including several Finno-Ugric languages such as North Saami. Later non-endangered languages were added. Italian is the first majority language to be added in a principled way. This work paves the way towards adding new languages in the future. Its purpose is threefold: it contributes to the raising of Italian from its beta status towards a full development stage; it formulates best practices for defining support for a new language and it serves as a documentation of what has been done, how and what remains to be done. Grammars and linguistic resources were consulted to compile an inventory of learning constructs for Italian. Analytic and pronominal verbs, verb government with prepositions, and noun phrase agreement were implemented by designing pattern rules that match sequences of tokens with specific parts-of-speech, surfaces and morphological tags. The rules were tested with test sentences that allowed further refining and correction of the rules. Current precision of the 47 rules for analytic and pronominal verbs on 177 test sentences results in 100%. Recall is 96.4%. Both precision and recall for the 5 noun phrase agreement rules result in 96.0% in respect to the 34 test sentences. Analytic and pronominal verb, as well as noun phrase agreement patterns, were used to generate open-end clozes. Verb government pattern rules were implemented into multiple-choice exercises where one of the four presented options is the correct preposition and the other three are prepositions that do not fit in context. The patterns were designed based on colligations, combinations of tokens (collocations) that are also explained by grammatical constraints. Verb government exercises were generated on a specifically collected corpus of 29074 words. The corpus included three types of text: biography sections from Wikipedia, Italian news articles and Italian language matriculation exams. The last text type generated the most exercises with a rate of 19 exercises every 10000 words, suggesting that the semi-authentic text met best the level of verb government exercises because of appropriate vocabulary frequency and sentence structure complexity. Four native language experts, either teachers of Italian as L2 or linguists, evaluated usability of the generated multiple-choice clozes, which resulted in 93.55%. This result suggests that minor adjustments i.e., the exclusion of target verbs that cause multiple-admissibility, are sufficient to consider verb government patterns usable until the possibility of dealing with multiple-admissible answers is addressed. The implementation of some of the most important learning constructs for Italian resulted feasible with current NLP tools, although quantitative evaluation of precision and recall of the designed rules is needed to evaluate the generation of exercises on authentic text. This work paves the way towards a full development stage of Italian in Revita and enables further pilot studies with actual learners, which will allow to measure learning outcomes in quantitative term

    Sensitivity to syntax in visual cortex

    Get PDF
    One of the most intriguing findings on language comprehension is that violations of syntactic predictions can affect event-related potentials as early as 120 ms, in the same time-window as early sensory processing. This effect, the so-called early left-anterior negativity (ELAN), has been argued to reflect word category access and initial syntactic structure building (Friederici, 2002). In two experiments, we used magnetoencephalography to investigate whether (a) rapid word category identification relies on overt category-marking closed-class morphemes and (b) whether violations of word category predictions affect modality-specific sensory responses. Participants read sentences containing violations of word category predictions. Unexpected items varied in whether or not their word category was marked by an overt function morpheme. In Experiment 1, the amplitude of the visual evoked M100 component was increased for unexpected items, but only when word category was overtly marked by a function morpheme. Dipole modeling localized the generator of this effect to the occipital cortex. Experiment 2 replicated the main results of Experiment 1 and eliminated two non-morphology-related explanations of the M100 contrast we observed between targets containing overt category-marking and targets that lacked such morphology. Our results show that during reading, syntactically relevant cues in the input can affect activity in occipital regions at around 125 ms, a finding that may shed new light on the remarkable rapidity of language processing

    Automatic correction of grammatical errors in non-native English text

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-107).Learning a foreign language requires much practice outside of the classroom. Computer-assisted language learning systems can help fill this need, and one desirable capability of such systems is the automatic correction of grammatical errors in texts written by non-native speakers. This dissertation concerns the correction of non-native grammatical errors in English text, and the closely related task of generating test items for language learning, using a combination of statistical and linguistic methods. We show that syntactic analysis enables extraction of more salient features. We address issues concerning robustness in feature extraction from non-native texts; and also design a framework for simultaneous correction of multiple error types. Our proposed methods are applied on some of the most common usage errors, including prepositions, verb forms, and articles. The methods are evaluated on sentences with synthetic and real errors, and in both restricted and open domains. A secondary theme of this dissertation is that of user customization. We perform a detailed analysis on a non-native corpus, illustrating the utility of an error model based on the mother tongue. We study the benefits of adjusting the correction models based on the quality of the input text; and also present novel methods to generate high-quality multiple-choice items that are tailored to the interests of the user.by John Sie Yuen Lee.Ph.D

    Learning from Partially Annotated Data: Example-aware Creation of Gap-filling Exercises for Language Learning

    Full text link
    Since performing exercises (including, e.g., practice tests) forms a crucial component of learning, and creating such exercises requires non-trivial effort from the teacher. There is a great value in automatic exercise generation in digital tools in education. In this paper, we particularly focus on automatic creation of gapfilling exercises for language learning, specifically grammar exercises. Since providing any annotation in this domain requires human expert effort, we aim to avoid it entirely and explore the task of converting existing texts into new gap-filling exercises, purely based on an example exercise, without explicit instruction or detailed annotation of the intended grammar topics. We contribute (i) a novel neural network architecture specifically designed for aforementioned gap-filling exercise generation task, and (ii) a real-world benchmark dataset for French grammar. We show that our model for this French grammar gap-filling exercise generation outperforms a competitive baseline classifier by 8% in F1 percentage points, achieving an average F1 score of 82%. Our model implementation and the dataset are made publicly available to foster future research, thus offering a standardized evaluation and baseline solution of the proposed partially annotated data prediction task in grammar exercise creation.Comment: 12 pages, Accepted in the 18th Workshop on Innovative Use of NLP for Building Educational Application

    AGReE: A system for generating Automated Grammar Reading Exercises

    Full text link
    We describe the AGReE system, which takes user-submitted passages as input and automatically generates grammar practice exercises that can be completed while reading. Multiple-choice practice items are generated for a variety of different grammar constructs: punctuation, articles, conjunctions, pronouns, prepositions, verbs, and nouns. We also conducted a large-scale human evaluation with around 4,500 multiple-choice practice items. We notice for 95% of items, a majority of raters out of five were able to identify the correct answer and for 85% of cases, raters agree that there is only one correct answer among the choices. Finally, the error analysis shows that raters made the most mistakes for punctuation and conjunctions.Comment: Accepted to EMNLP 2022 Demonstration Trac

    Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions

    Full text link
    In this paper, we propose a novel configurable framework to automatically generate distractive choices for open-domain cloze-style multiple-choice questions, which incorporates a general-purpose knowledge base to effectively create a small distractor candidate set, and a feature-rich learning-to-rank model to select distractors that are both plausible and reliable. Experimental results on datasets across four domains show that our framework yields distractors that are more plausible and reliable than previous methods. This dataset can also be used as a benchmark for distractor generation in the future.Comment: To appear at AAAI 202

    Revita: a System for Language Learning and Supporting Endangered Languages

    Get PDF
    We describe a computational system for language learning and supporting endangered languages. The platform provides the user an opportunity to improve her competency through active language use. The platform currently works with several endangered Finno-Ugric languages, as well as with Yakut, and Finnish, Swedish, and Russian. This paper describes the current stage of ongoing development.Peer reviewe

    Aspects of First Language Attrition: A Case Study of German Immigrants in East Tennessee

    Get PDF
    This dissertation examines aspects of first language attrition (L1= German) in a second language (L2= English) environment. It sheds light on language contact and attrition research and focuses on first generation German immigrants to East Tennessee who were administered a series of tests to ascertain their language attrition to establish extralinguistic factors promoting or inhibiting it. The Study Group consisted of 22 German immigrants to the U.S., both men and women, aged between 27 and 68, who emigrated as late teens or adults and have been here for more than three years. The Control Group consisted of 12 German native speakers in Germany similar to the American informants in education level, age and gender. The informants from both groups were interviewed, given a questionnaire and asked to describe pictures into an audio recorder. They were also given a cloze/fill-in text targeting lexical items and the correct usage of specific L1 grammatical structures such as gender articles, formation of plurals and cases. The quantitative and qualitative analysis of the data from the Study Group revealed that L1 attrition is not severe, although extralinguistic variables such as age, time since immigration, level of education and amount of L1 contact, affect lexical retrieval and gender assignment, and case and plural marking. Statistical analysis of the cloze test data, picture description and interview indicated significant differences at the p\u3c.05 level both in the lexical and morphological domains between subgroups (organized by variable) in the Study Group versus parallel ones in the Control Group. The qualitative data analysis showed that mostly social domains, such as shopping, daily routine, working settings or leisure activities, were affected by L2 transfer, borrowings or loan shifts. The lexical density test performed on the data revealed group differences between the Study and the Control groups. All the informants spontaneously used English words, phrases and loan translations in their German speech and all are aware of their code-switching, but only 17% view it negatively, while 40% have a neutral attitude towards this practice. The Study Group still highly values German language and culture
    corecore