3,352 research outputs found

    Annotating article errors in Spanish learner texts: design and evaluation of an annotation scheme

    Get PDF

    VALICO-UD: annotating an Italian learner corpus

    Get PDF
    Previous work on learner language has highlighted the importance of having annotated resources to describe the development of interlanguage. Despite this, few learner resources, mainly for English L2, feature error and syntactic annotation. This thesis describes the development of a novel parallel learner Italian treebank, VALICO-UD. Its name suggests two main points: where the data comes from—i.e. the corpus VALICO, a collection of non-native Italian texts elicited by comic strips—and what formalism is used for linguistic annotation—i.e. Universal Dependencies (UD) formalism. It is a parallel treebank because the resource provides for each learner sentence (LS) a target hypothesis (TH) (i.e., parallel corrected version written by an Italian native speaker) which is in turn annotated in UD. We developed this treebank to be exploitable for interlanguage research and comparable with the resources employed in Natural Language Processing tasks such as Native Language Identification or Grammatical Error Identification and Correction. VALICO-UD is composed of 237 texts written by English, French, German and Spanish native speakers, which correspond to 2,234 LSs, each associated with a single TH. While all LSs and THs were automatically annotated using UDPipe, only a portion of the treebank made of 398 LSs plus correspondent THs has been manually corrected and released in May 2021 in the UD repository. This core section features also an explicit XML-based annotation of the errors occurring in each sentence. Thus, the treebank is currently organized in two sections: the core gold standard—comprising 398 LSs and their correspondent THs—and the silver standard—consisting of 1,836 LSs and their correspondent THs. In order to contribute to the computational investigation about the peculiar type of texts included in VALICO-UD, this thesis describes the annotation schema of the resource, provides some preliminary tests about the performance of UDPipe models on this treebank, reports on inter-annotator agreement results for both error and linguistic annotation, and suggests some possible applications

    Towards error annotation in a learner corpus of Portuguese

    Get PDF
    In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to support students in their learning process. The corpus is available online using TEITOK environment, a web-based framework for corpus treatment that provides several built-in NLP tools and a rich set of functionalities (multiple orthographic transcription layers, lemmatization and POS, normalization of the tokens, error annotation) to automatically process and annotate texts in xml format. A CQP-based search interface allows searching the corpus for different fields, such as words, lemmas, POS tags or error tags. We will describe the work in progress regarding the constitution and linguistic annotation of this corpus, particularly focusing on error annotation.info:eu-repo/semantics/publishedVersio

    Intelligent CALL

    Get PDF
    This chapter describes the provision of corrective feedback in Tutorial CALL, sketching the challenges in the research and development of computational parsers and grammars. The automatic evaluation and assessment of free-form learner texts paying attention to linguistic accuracy, rhetorical structures, textual complexity, and written fluency is at the centre of attention in the section on Automatic Writing Evaluation. Reading and Incidental Vocabulary Learning Aids looks at the advantages of lexical glosses, or look-up information in electronic dictionaries for reading material aimed at language learners. The conclusion looks at the role of ICALL in the context of general trends in CALL

    The brain signature for reading in high-skilled deaf adults: behavior and electrophysiological evidence

    Get PDF
    327 p.La presente tesis investiga cómo se da el procesamiento de la información sintáctica y semántica en lectores sordos competentes. En primer lugar, investigamos qué similitudes y/o diferencias comparten los lectores sordos con los lectores oyentes nativos. En segundo lugar, puesto que sabemos que la experiencia lingüística impacta el procesamiento del lenguaje en el cerebro, también comparamos el mismo grupo de lectores sordos con un grupo de bilingües tardíos del español. Para tanto, evaluamos estas propuestas a través de la técnica de electroencefalograma (EEG) y de los Potenciales Evocados Relacionados a Eventos (ERP) para comprender cómo es la respuesta fisiológica de lectores sordos durante una tarea de lectura de frases. Las respuestas a estas preguntas aportarán conocimiento sobre los mecanismos cognitivos de los buenos lectores sordos, y conllevan implicaciones prácticas respecto a la creación de nuevos métodos de enseñanza

    The brain signature for reading in high-skilled deaf adults: behavior and electrophysiological evidence

    Get PDF
    327 p.La presente tesis investiga cómo se da el procesamiento de la información sintáctica y semántica en lectores sordos competentes. En primer lugar, investigamos qué similitudes y/o diferencias comparten los lectores sordos con los lectores oyentes nativos. En segundo lugar, puesto que sabemos que la experiencia lingüística impacta el procesamiento del lenguaje en el cerebro, también comparamos el mismo grupo de lectores sordos con un grupo de bilingües tardíos del español. Para tanto, evaluamos estas propuestas a través de la técnica de electroencefalograma (EEG) y de los Potenciales Evocados Relacionados a Eventos (ERP) para comprender cómo es la respuesta fisiológica de lectores sordos durante una tarea de lectura de frases. Las respuestas a estas preguntas aportarán conocimiento sobre los mecanismos cognitivos de los buenos lectores sordos, y conllevan implicaciones prácticas respecto a la creación de nuevos métodos de enseñanza

    Interpreting language-learning data

    Get PDF
    This book provides a forum for methodological discussions emanating from researchers engaged in studying how individuals acquire an additional language. Whereas publications in the field of second language acquisition generally report on empirical studies with relatively little space dedicated to questions of method, the current book gave authors the opportunity to more fully develop a discussion piece around a methodological issue in connection with the interpretation of language-learning data. The result is a set of seven thought-provoking contributions from researchers with diverse interests. Three main topics are addressed in these chapters: the role of native-speaker norms in second-language analyses, the impact of epistemological stance on experimental design and/or data interpretation, and the challenges of transcription and annotation of language-learning data, with a focus on data ambiguity. Authors expand on these crucial issues, reflect on best practices, and provide in many instances concrete examples of the impact they have on data interpretation

    The COPLE2 Corpus: a Learner Corpus for Portuguese

    Get PDF
    We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.info:eu-repo/semantics/publishedVersio

    Building the Arabic Learner Corpus and a System for Arabic Error Annotation

    Get PDF
    Recent developments in learner corpora have highlighted the growing role they play in some linguistic and computational research areas such as language teaching and natural language processing. However, there is a lack of a well-designed Arabic learner corpus that can be used for studies in the aforementioned research areas. This thesis aims to introduce a detailed and original methodology for developing a new learner corpus. This methodology which represents the major contribution of the thesis includes a combination of resources, proposed standards and tools developed for the Arabic Learner Corpus project. The resources include the Arabic Learner Corpus, which is the largest learner corpus for Arabic based on systematic design criteria. The resources also include the Error Tagset of Arabic that was designed for annotating errors in Arabic covering 29 types of errors under five broad categories. The Guide on Design Criteria for Learner Corpus is an example of the proposed standards which was created based on a review of previous work. It focuses on 11 aspects of corpus design criteria. The tools include the Computer-aided Error Annotation Tool for Arabic that provides some functions facilitating error annotation such as the smart-selection function and the auto-tagging function. Additionally, the tools include the ALC Search Tool that is developed to enable searching the ALC and downloading the source files based on a number of determinants. The project was successfully able to recruit 992 people including language learners, data collectors, evaluators, annotators and collaborators from more than 30 educational institutions in Saudi Arabia and the UK. The data of the Arabic Learner Corpus was used in a number of projects for different purposes including error detection and correction, native language identification, Arabic analysers evaluation, applied linguistics studies and data-driven Arabic learning. The use of the ALC highlights the extent to which it is important to develop this project
    • …
    corecore