28,533 research outputs found

    Semi-automatically Annotated Learner Corpus for Russian

    Get PDF
    We present ReLCo— the Revita Learner Corpus—a new semi-automatically annotated learner corpus for Russian. The corpus was collected while several hundreds L2 learners were performing exercises using the Revita language-learning system. All errors were detected automatically by the system and annotated by type. Part of the corpus was annotated manually—this part was created for further experiments on automatic assessment of grammatical correctness. The Learner Corpus provides valuable data for studying patterns of grammatical errors, experimenting with grammatical error detection and grammatical error correction, and developing new exercises for language learners. Automating the collection and annotation makes the process of building the learner corpus much cheaper and faster, in contrast to the traditional approach of building learner corpora. We make the data publicly available.Peer reviewe

    A trajetória da compilação de um corpus de aprendiz: um desafio metodológico compensador

    Get PDF
    Corpus compilation is a challenging research endeavor that many researchers decide to pursue. Few learner corpora, however, can be easily accessed (e.g.,the International Corpus of Learner English), and none of them carry a variety of text registers written by English learners at different proficiency levels studying in the Brazilian university context. Therefore, the aim of this paper is to present the compilation of a learner corpus, much needed in our research and teaching context, pointing out the advantages of building this type of corpus for the understanding of learners’ needs as well as for pedagogical decision-making based on sound data. Presenting a detailed rationale of the corpus compilation, this article reveals the various decisions made in order to guarantee that fair comparisons can be made. To exemplify the value of building a carefully designed corpus, results of previous studies are compared. Some of the conclusions reached refer to the need for discipline-specific tasks to propel writing proficiency and for authorship skills to be developed in English for Academic Purposes classes to foster academic success.A compilação de corpus é uma empreitada de pesquisa desafiadora que muitos pesquisadores decidem realizar. Poucos corpora de aprendizes, entretanto, podem ser facilmente acessados (por exemplo, o International Corpus of Learner English), e nenhum deles carrega uma variedade de registros textuais escritos por aprendizes de inglês de níveis diferentes de proficiência e que estudam no contexto universitário brasileiro. Nesse sentido, o objetivo deste artigo é apresentar a compilação de um corpus de aprendiz, muito necessário em nosso contexto de pesquisa e ensino, evidenciando as vantagens de construir este tipo de corpus para a compreensão das necessidades dos aprendizes, bem como para as tomadas de decisões pedagógicas baseadas em dados sólidos. Apresentando a fundamentação detalhada para a compilação do corpus, este trabalho revela as várias decisões tomadas, a fim de garantir que comparações justas possam ser feitas. Algumas conclusões obtidas referem-se à necessidade de tarefas específicas por área para impulsionar a proficiência na escrita, e para o desenvolvimento das habilidades de autoria nas aulas de Inglês para Fins Acadêmicos para fomentar o sucesso acadêmico

    Learner Corpora and Embedded Assessment of Undergraduate EFL Writing: The Case of Metadiscourse Markers

    Get PDF
    The present contribution discusses how a learner corpus can be used to identify learning gaps and plan assessments embedded in teaching and learning activities both inside and outside of the classroom. The learner corpus under investigation is a collection of opinion articles written by undergraduate students with English as a foreign language. A concordancer software was used to generate frequency lists from this collection and perform related searches. A first look at the list of the most frequent n-grams prompted us to consider specific clusters, which seem to relate to the organisation dimension of writing and the use of so-called ‘metadiscourse’. A closer look at the concordance lines and the collocates for these clusters elicited initial “writing questions,” such as “what patterns of co-occurrence can be found for the search terms?” and “what is the role of these patterns in topic development and argument building?” These same questions can be passed on to the students as part of hands-on activities aimed at encouraging observation, such as short guided searches on the learner corpus, related searches on reference corpora and other learner corpora, and learning logs based on these searches. Ultimately, a learner corpus can be employed to generate continuous formative assessment (including peer- and self-assessment), thus providing students with feedback for improvement and at the same time encouraging them to reflect on their own learning process

    VALICO-UD: Treebanking an Italian Learner Corpus in Universal Dependencies

    Get PDF
    This article describes an ongoing project for the development of a novel Italian treebank in Universal Dependencies format: VALICO-UD. It consists of texts written by Italian L2 learners of different mother tongues (German, French, Spanish and English) drawn from VALICO, an Italian learner corpus elicited by comic strips. Aiming at building a parallel treebank currently missing for Italian L2, comparable with those exploited in Natural Language Processing tasks, we associated each learner sentence with a target hypothesis (i.e. a corrected version of the learner sentence written by an Italian native speaker), which is in turn annotated in Universal Dependencies. The treebank VALICO-UD is composed of 237 texts written by non-native speakers of Italian (2,234 sentences) and the related target hypotheses, all automatically annotated using UDPipe. A portion of this resource (36 texts corresponding to 398 learner sentences and related target hypotheses)—firstly released on May 2021 in the Universal Dependencies repository—is associated with error annotation and the automatic output is fully manually checked. In this article, we focus especially on the challenges addressed in treebanking a resource composed of learner texts. In addition, we report on a preliminary data exploration that makes use of three quantitative measures for assessing the quality of the data and for better understanding the role that this resource can play in tasks lying at the intersection of Computational Linguistics and learner corpus studies

    The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

    Full text link
    We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as " burchak " for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self-and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.Comment: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17

    A graph-based approach for learner-tailored teaching of Korean grammar constructions

    Get PDF
    corecore