27 research outputs found

    Нейромережеві підходи для задач письмового асистента

    Get PDF
    The article is devoted to the analysis of tasks for building a writing assistant, one of the most prominent fields of natural language processing and artificial intelligence in general. Specifically, we explore monolingual local sequence transduction tasks: grammatical and spelling errors correction, text simplification, paraphrase generation. To give a better understanding of the considered tasks, we show examples of expected rewrites. Then we take a deep look at such key aspects as existing publicly available datasets and their training splits, quality metrics for high quality evaluation, and modern solutions based primarily on neural networks. For each task, we analyze its main peculiarities and how they influence the state-of-the-art models. Eventually, we investigate the most eloquent shared features for the whole group of tasks in general and for approaches that provide solutions to them. Pages of the article in the issue: 232 - 238 Language of the article: UkrainianСтаття присвячена дослідженню та аналізу задач для побудови письмового асистенту: виправлення граматичних та орфографічних помилок, спрощення тексту та перефразування. Розглядаються розмічені набори даних, метрики визначення якості роботи систем та провідні практики вирішення для розв’язання таких задач з використанням нейронних мереж. Для кожної задачі розглядається його специфіка та вплив на запропоновані методи. Аналізуються спільні риси підходів до вирішення задач письмового асистента та їх рішень

    JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

    Full text link
    We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and benchmark four leading GEC systems on this corpus, identifying specific areas in which they do well and how they can improve. JFLEG fulfills the need for a new gold standard to properly assess the current state of GEC.Comment: To appear in EACL 2017 (short papers

    Adapting Sequence Models for Sentence Correction

    Full text link
    In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches. Our strongest sequence-to-sequence model improves over our strongest phrase-based statistical machine translation model, with access to the same data, by 6 M2 (0.5 GLEU) points. Additionally, in the data environment of the standard CoNLL-2014 setup, we demonstrate that modeling (and tuning against) diffs yields similar or better M2 scores with simpler models and/or significantly less data than previous sequence-to-sequence approaches.Comment: EMNLP 201

    A Nested Attention Neural Hybrid Model for Grammatical Error Correction

    Full text link
    Grammatical error correction (GEC) systems strive to correct both global errors in word order and usage, and local errors in spelling and inflection. Further developing upon recent work on neural machine translation, we propose a new hybrid neural model with nested attention layers for GEC. Experiments show that the new model can effectively correct errors of both types by incorporating word and character-level information,and that the model significantly outperforms previous neural models for GEC as measured on the standard CoNLL-14 benchmark dataset. Further analysis also shows that the superiority of the proposed model can be largely attributed to the use of the nested attention mechanism, which has proven particularly effective in correcting local errors that involve small edits in orthography

    An Analysis of Source-Side Grammatical Errors in NMT

    Full text link
    The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise. We present the first large-scale study of state-of-the-art English-to-German NMT on real grammatical noise, by evaluating on several Grammar Correction corpora. We present methods for evaluating NMT robustness without true references, and we use them for extensive analysis of the effects that different grammatical errors have on the NMT output. We also introduce a technique for visualizing the divergence distribution caused by a source-side error, which allows for additional insights.Comment: Accepted and to be presented at BlackboxNLP 201
    corecore