1,427 research outputs found

    Compiling and annotating a learner corpus for a morphologically rich language: CzeSL, a corpus of non-native Czech

    Get PDF
    Learner corpora, linguistic collections documenting a language as used by learners, provide an important empirical foundation for language acquisition research and teaching practice. This book presents CzeSL, a corpus of non-native Czech, against the background of theoretical and practical issues in the current learner corpus research. Languages with rich morphology and relatively free word order, including Czech, are particularly challenging for the analysis of learner language. The authors address both the complexity of learner error annotation, describing three complementary annotation schemes, and the complexity of description of non-native Czech in terms of standard linguistic categories. The book discusses in detail practical aspects of the corpus creation: the process of collection and annotation itself, the supporting tools, the resulting data, their formats and search platforms. The chapter on use cases exemplifies the usefulness of learner corpora for teaching, language acquisition research, and computational linguistics. Any researcher developing learner corpora will surely appreciate the concluding chapter listing lessons learned and pitfalls to avoid

    Teaching for progression: writing

    Get PDF

    Building the Arabic Learner Corpus and a System for Arabic Error Annotation

    Get PDF
    Recent developments in learner corpora have highlighted the growing role they play in some linguistic and computational research areas such as language teaching and natural language processing. However, there is a lack of a well-designed Arabic learner corpus that can be used for studies in the aforementioned research areas. This thesis aims to introduce a detailed and original methodology for developing a new learner corpus. This methodology which represents the major contribution of the thesis includes a combination of resources, proposed standards and tools developed for the Arabic Learner Corpus project. The resources include the Arabic Learner Corpus, which is the largest learner corpus for Arabic based on systematic design criteria. The resources also include the Error Tagset of Arabic that was designed for annotating errors in Arabic covering 29 types of errors under five broad categories. The Guide on Design Criteria for Learner Corpus is an example of the proposed standards which was created based on a review of previous work. It focuses on 11 aspects of corpus design criteria. The tools include the Computer-aided Error Annotation Tool for Arabic that provides some functions facilitating error annotation such as the smart-selection function and the auto-tagging function. Additionally, the tools include the ALC Search Tool that is developed to enable searching the ALC and downloading the source files based on a number of determinants. The project was successfully able to recruit 992 people including language learners, data collectors, evaluators, annotators and collaborators from more than 30 educational institutions in Saudi Arabia and the UK. The data of the Arabic Learner Corpus was used in a number of projects for different purposes including error detection and correction, native language identification, Arabic analysers evaluation, applied linguistics studies and data-driven Arabic learning. The use of the ALC highlights the extent to which it is important to develop this project

    A critical analysis of Herman Charles Bosman’s juvenilia

    Get PDF
    M.A. (English)The broad scope of this dissertation is the collection, editing and publishing of Herman Charles Bosman’s juvenilia with the purpose of re-introducing these stories into the public domain. The project involves creating a critical edition of Bosman’s juvenilia through careful and diplomatic editorial processes. The resultant typescript is the first presentation of what is now posited as the entire collection of Herman Charles Bosman’s juvenilia. The project adds a total of seven previously un-credited stories to the already published collections of Bosman’s juvenilia. The dissertation extends into an in-depth analysis of what juvenilia is, and focuses on the problems relating to the delineation of works as juvenilia. Additionally, there is a discussion on the theory and practice of textual criticism, where a general background and overview of the history and practice of textual criticism is presented, including the textual history of Bosman’s juvenilia and the processes involved in the production of the critical edition. Beyond this, there is also a general analysis of Bosman’s juvenilia, focusing on themes, narrative modes and point of view, imagery and language

    WRITTEN CORRECTIVE FEEDBACK: EFFECTS OF FOCUSED AND UNFOCUSED GRAMMAR CORRECTION ON THE CASE ACQUISITION IN L2 GERMAN

    Get PDF
    Thirty-three students of fourth semester German at the University Kansas participated in the study which sought to investigate whether focused written corrective feedback (WCF) promoted the acquisition of the German case morphology over the course of a semester. Participants received teacher WCF on five two-draft essay assignments under three treatment conditions: Group (1) received focused WCF on German case errors; group (2) received unfocused WCF on a variety of German grammar errors; and group (3) did not receive WCF on specific grammar errors. Combining quantitative and qualitative analyses, the study found that the focused group improved significantly in the accuracy of case forms while the unfocused and the control group did not make any apparent progress. The results indicate that focused WCF was effective in improving case accuracy in subjects' writings in German as a foreign language (GFL) context. WCF did not negatively affect writing fluency or students' attitude toward writing

    Grammatical Error Correction: A Survey of the State of the Art

    Full text link
    Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments

    Natural Language Processing Resources for Finnish. Corpus Development in the General and Clinical Domains

    Get PDF
    Siirretty Doriast

    MS Paris, Bibliothèque des Missions étrangères 1069: The French-Arabic Dictionary of François Pétis de la Croix (1653–1713)?

    Get PDF
    This paper analyses an anonymous French-Arabic dictionary preserved in Paris, Bibliothèque des Missions étrangères. I argue that it seems to be a copy of a dictionary compiled by the early modern French Orientalist and diplomat François Pétis de la Croix, the younger. Beyond the question of authorship, I survey the themes and structure of the dictionary and discuss the compiler’s cultural insights into Ottoman and Safavid societies and the cultural barriers that his translations reveal.This paper analyses an anonymous French-Arabic dictionary preserved in Paris, Bibliothèque des Missions étrangères. I argue that it seems to be a copy of a dictionary compiled by the early modern French Orientalist and diplomat François Pétis de la Croix, the younger. Beyond the question of authorship, I survey the themes and structure of the dictionary and discuss the compiler’s cultural insights into Ottoman and Safavid societies and the cultural barriers that his translations reveal
    • …
    corecore