16 research outputs found
Corrección gramatical para euskera mediante una arquitectura neuronal seq2seq y ejemplos sintéticos
Sequence-to-sequence neural architectures are the state of the art for addressing the task of correcting grammatical errors. However, large training datasets are required for this task. This paper studies the use of sequence-to-sequence neural models for the correction of grammatical errors in Basque. As there is no training data for this language, we have developed a rule-based method to generate grammatically incorrect sentences from a collection of correct sentences extracted from a corpus of 500,000 news in Basque. We have built different training datasets according to different strategies to combine the synthetic examples. From these datasets different models based on the Transformer architecture have been trained and evaluated according to accuracy, recall and F0.5 score. The results obtained with the best model reach 0.87 of F0.5 score.Las arquitecturas neuronales secuencia a secuencia constituyen el estado del arte para abordar la tarea de corrección de errores gramaticales. Sin embargo, su entrenamiento requiere de grandes conjuntos de datos. Este trabajo estudia el uso de modelos neuronales secuencia a secuencia para la corrección de errores gramaticales en euskera. Al no existir datos de entrenamiento para este idioma, hemos desarrollado un método basado en reglas para generar de forma sintética oraciones gramaticalmente incorrectas a partir de una colección de oraciones correctas extraídas de un corpus de 500.000 noticias en euskera. Hemos construido diferentes conjuntos de datos de entrenamiento de acuerdo a distintas estrategias para combinar los ejemplos sintéticos. A partir de estos conjuntos de datos hemos entrenado sendos modelos basados en la arquitectura Transformer que hemos evaluado y comparado de acuerdo a las métricas de precisión, cobertura y F0.5. Los resultados obtenidos con el mejor modelo alcanzan un F0.5 de 0.87
Computational Linguistics Models and Language Technologies for Indonesian
The purpose of this research is to describe computational linguistics as a study of science that should pay full attention to linguistics researches improvement. The type of research is a literature review and experimental research by designing a software model for Bahasa. The result of the research shows that computational linguistics is a field of linguistics that can be used as a solution to overcome a problem related to spelling correction and grammar for language users. This field of linguistics is related to software engineering designed to educate the public in producing languages; it can be Bahasa, regional language, and English language as the foreign language for Indonesian. The public can know the standardization of writing a language and equivalence translation between the target language and the source language that can also be precisely acquired. Also, the writer provides several practical examples of how computational linguistics can be applied to the development of writing skills. For instance, the concordance enables us to see any word or phrase in context so that one can see what sort of company it keeps. Thus, the users can, for example, see the correct form based on Bahasa Indonesia rules between the words which they often confuse (e.g., gadget vs. gawai)
Conceptualizations of language errors, standards, norms and nativeness in English for research publication purposes: An analysis of journal submission guidelines
Adherence to standards in English for research publication purposes (ERPP) can be a substantial barrier for second language (L2) writers and is an area of renewed debate in L2 writing research. This study presents a qualitative text analysis of author guidelines in 210 leading academic journals across 27 disciplines. It explores conceptualizations of language errors, standards, norms and nativeness in journal submission guidelines, and identifies key concepts related to so-called error-free writing. Findings indicate that most of the journal guidelines are inflexible in their acceptance of variant uses of English. Some guidelines state a requirement of meeting an unclear standard of good English, sometimes described as American or British English. Many guidelines specifically position L2 writers as deficient of native standards, which raises ethical considerations of access to publication in top journals. This study leads to a discussion of a need to re-conceptualize error-free writing in ERPP, and to decouple it from concepts such as nativeness. It focuses on a need to relax some author guidelines to encourage all authors to write using an English that can easily be understood by a broad, heterogeneous, global, and multilingual audience
Conceptualizations of language errors, standards, norms and nativeness in English for research publication purposes: An analysis of journal submission guidelines
Adherence to standards in English for research publication purposes (ERPP) can be a substantial barrier for second language (L2) writers and is an area of renewed debate in L2 writing research. This study presents a qualitative text analysis of author guidelines in 210 leading academic journals across 27 disciplines. It explores conceptualizations of language errors, standards, norms and nativeness in journal submission guidelines, and identifies key concepts related to so-called error-free writing. Findings indicate that most of the journal guidelines are inflexible in their acceptance of variant uses of English. Some guidelines state a requirement of meeting an unclear standard of good English, sometimes described as American or British English. Many guidelines specifically position L2 writers as deficient of native standards, which raises ethical considerations of access to publication in top journals. This study leads to a discussion of a need to re-conceptualize error-free writing in ERPP, and to decouple it from concepts such as nativeness. It focuses on a need to relax some author guidelines to encourage all authors to write using an English that can easily be understood by a broad, heterogeneous, global, and multilingual audience
Recommended from our members
‘Calling on the classical phone’: a distributional model of adjective-noun errors in learners’ English
In this paper we discuss three key points related to error detection (ED) in learners’ English. We focus on content word ED as one of the most challenging tasks in this area, illustrating our claims on adjective–noun (AN) combinations. In particular, we (1) investigate the role of context in accurately capturing semantic anomalies and implement a system based on distributional topic coherence, which achieves state-of-the-art accuracy on a standard test set; (2) thoroughly investigate our system’s performance across individual adjective classes, concluding that a class-dependent approach is beneficial to the task; (3) discuss the data size bottleneck in this area, and highlight the challenges of automatic error generation for content words.Ekaterina Kochmar’s research is supported by Cambridge English Language Assessment via the ALTA Institute. Aurélie Herbelot’s contribution to this paper was similarly supported by ALTA
Problems in Evaluating Grammatical Error Detection Systems
ABSTRACT Many evaluation issues for grammatical error detection have previously been overlooked, making it hard to draw meaningful comparisons between different approaches, even when they are evaluated on the same corpus. To begin with, the three-way contingency between a writer's sentence, the annotator's correction, and the system's output makes evaluation more complex than in some other NLP tasks, which we address by presenting an intuitive evaluation scheme. Of particular importance to error detection is the skew of the data -the low frequency of errors as compared to non-errors -which distorts some traditional measures of performance and limits their usefulness, leading us to recommend the reporting of raw measurements (true positives, false negatives, false positives, true negatives). Other issues that are particularly vexing for error detection focus on defining these raw measurements: specifying the size or scope of an error, properly treating errors as graded rather than discrete phenomena, and counting non-errors. We discuss recommendations for best practices with regard to reporting the results of system evaluation for these cases, recommendations which depend upon making clear one's assumptions and applications for error detection. By highlighting the problems with current error detection evaluation, the field will be better able to move forward
英作文支援のための冠詞誤り訂正及びその根拠の提示
Tohoku University乾健太
言語テスト手法を第二言語使用の支援システムに適用するためのフレームワーク
学位の種別:課程博士University of Tokyo(東京大学