7 research outputs found

    Sentence-level grammatical error identification as sequence-to-sequence correction

    Get PDF
    We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016. The attention-based encoder-decoder models can be used for the generation of corrections, in addition to error identification, which is of interest for certain end-user applications. We show that a character-based encoder-decoder model is particularly effective, outperforming other results on the AESW Shared Task on its own, and showing gains over a word-based counterpart. Our final model— a combination of three character-based encoder-decoder models, one word-based encoder-decoder model, and a sentence-level CNN—is the highest performing system on the AESW 2016 binary prediction Shared Task.Engineering and Applied Science

    On Methods of Data Standardization of German Social Media Comments

    Full text link
    [EN] This article is part of a larger project aiming at identifying discursive strategies in social media discourses revolving around the topic of gender diversity, for which roughly 350,000 comments were scraped from the comments sections below YouTube videos relating to the topic in question. This article focuses on different methods of standardizing social media data in order to enhance further processing. More specifically, the data are corrected in terms of casing, spelling, and punctuation. Different tools and models (LanguageTool, T5, seq2seq, GPT-2) were tested. The best outcome was achieved by the German GPT-2 model: It scored highest in all of the applied scores (ROUGE, GLEU, BLEU), making it the best model for the task of Grammatical Error Correction in German social media data.Melnyk, L.; Feld, L. (2023). On Methods of Data Standardization of German Social Media Comments. Journal of Computer-Assisted Linguistic Research. 7:22-42. https://doi.org/10.4995/jclr.2023.199072242

    Students and teachers’ perception about the writing process

    Get PDF
    The main aim of this research was to identify the tenth grade students and teachers’ perceptions respect to the teaching of English writing skill. A survey of 14 question were applied to fifty eight tenth graders of Jorge Icaza Educative Unit, Latacunga- Ecuador in order to identify their perception regarding to writing skill and the methodology teachers use to teach it. In addition, a survey of 17 questions were applied to English teachers of institutions that belong to Eloy Alfaro Parish, Latacunga Ecuador in order to know their beliefs about how and when to teach writing skill. Data were processed in excel and presented in graphics. The analysis was done by pointed out the main findings; concepts form literature reviews and researchers’ point of view. Results of students’ survey show that they are not able to write simple pieces of writing like paragraphs, letters, stories and personal experiences because there is not an appropriate process of writing practice in the classroom. Writing is given little attention by teachers and students and its practice is not regular into the classroom….El principal objetivo de esta investigación fue identificar las percepciones de los estudiantes y docentes con relación a la enseñanza de la escritura en inglés. Se aplicó una encuesta de 14 preguntas a 58 estudiantes de los décimos años de la unidad educativa “Jorge Icaza” para identificar sus percepciones con respecto a la escritura y la metodología que los docentes usan para enseñar esta habilidad. Además, una encuesta de 17 preguntas fue aplicada a 10 docentes de inglés de instituciones pertenecientes a la parroquia Eloy Alfaro, Latacunga- Ecuador, para conocer sus creencias acerca de cómo y cuándo enseñan enseñar a escribir en inglés. La información fue procesada en excel y presentada en gráficos. El análisis fue hecho señalando resultados principales, conceptos de revisión literaria y puntos de vista de los investigadores. Los resultados de la encuesta de los estudiantes mostraron que ellos no son capaces de escribir simples escritos como párrafos, cartas, cuentos y experiencias personales porque no existe un proceso apropiado de práctica de escritura en la clase…

    Robust Text Correction for Grammar and Fluency

    Get PDF
    Grammar is one of the most important properties of natural language. It is a set of structural (i.e., syntactic and morphological) rules that are shared among native speakers in order to engage smooth communication. Automated grammatical error correction (GEC) is a natural language processing (NLP) application, which aims to correct grammatical errors in a given source sentence by computational models. Since the data-driven statistical methods began in 1990s and early 2000s, the GEC com- munity has worked on establishing a common framework for its evaluation (i.e., dataset and metric for benchmarking) in order to compare GEC models’ performance quantitatively. A series of shared tasks since early 2010s is a good example of this. In the first half of this thesis, I propose character-level and token-level error correction algorithms. For the character-level error correction, I introduce a semi-character recurrent neural network, which is motivated by a finding in psycholinguistics, called the Cmabrigde Uinervtisy (Cambridge University) effect or typoglycemia. For word-level error correc- tion, I propose an error-repair dependency parsing algorithm for ungrammatical texts. The algorithm can parse sentences and correct grammatical errors simultaneously. However, it is important to note that grammatical errors are not usually limited to mor- phological or syntactic errors. For example, collocational errors such as *quick/fast food and *fast/quick meal are not fully explained by only syntactic rules. This is another im- portant property of natural language, called fluency (or acceptability). Fluency is a level of mastery that goes beyond knowledge of how to follow the rules, and includes know- ing when they can be broken or flouted. In fact, the GEC community has also extended the scope of error types from closed class errors (e.g., noun numbers, verb forms) to the fluency-oriented errors. The second half of this thesis investigates GEC while considering fluency as well as grammaticality. When it comes to “whole-sentence” correction, by extending the scope of errors considering fluency as well as grammaticality, the GEC community has overlooked the reliability and validity of the task scheme (i.e., evaluation metric and dataset for bench- marking). Thus, I reassess the goals of GEC as a “whole-sentence” rewriting task while considering fluency. Following the fluency-oriented GEC framework, I introduce a new benchmark corpus that is more diverse in various aspects such as proficiency, topics, and learners’ native languages. Based on the fluency-oriented metric and dataset, I propose a new “whole-sentence” error correction model with neural reinforcement learning. Unlike conventional maximum likelihood estimation (MLE), the model directly optimizes toward an objective that consid- ers a sentence-level, task-specific evaluation metric. I demonstrate that the proposed model outperforms MLE in human and automated evaluation metrics. Finally, I conclude the thesis and outline ideas and suggestions for future GEC research
    corecore