32 research outputs found

    Noisy Channel for Low Resource Grammatical Error Correction

    Get PDF

    Нейромережеві підходи для задач письмового асистента

    Get PDF
    The article is devoted to the analysis of tasks for building a writing assistant, one of the most prominent fields of natural language processing and artificial intelligence in general. Specifically, we explore monolingual local sequence transduction tasks: grammatical and spelling errors correction, text simplification, paraphrase generation. To give a better understanding of the considered tasks, we show examples of expected rewrites. Then we take a deep look at such key aspects as existing publicly available datasets and their training splits, quality metrics for high quality evaluation, and modern solutions based primarily on neural networks. For each task, we analyze its main peculiarities and how they influence the state-of-the-art models. Eventually, we investigate the most eloquent shared features for the whole group of tasks in general and for approaches that provide solutions to them. Pages of the article in the issue: 232 - 238 Language of the article: UkrainianСтаття присвячена дослідженню та аналізу задач для побудови письмового асистенту: виправлення граматичних та орфографічних помилок, спрощення тексту та перефразування. Розглядаються розмічені набори даних, метрики визначення якості роботи систем та провідні практики вирішення для розв’язання таких задач з використанням нейронних мереж. Для кожної задачі розглядається його специфіка та вплив на запропоновані методи. Аналізуються спільні риси підходів до вирішення задач письмового асистента та їх рішень

    The Automatic Generation of Contextual Questions and Answers for English Learners

    Get PDF
    Understanding context is essential for ESL (English as a Second Language) students to become skilled in English. While there is an abundance of extant contextual questions, they are not tailored to ESL teachers’ course objectives and reading materials. For this reason, ESL teachers must continuously create their own contextual questions. The NLP question and answer generation tasks can lift ESL teachers’ workload by creating MCQs (Multiple Choice Questions), T/F (True or False) questions, and fill-in-the-blank questions, along with answers. We deployed a model which automatically generates MC and Wh- questions with answers. We display several examples and explain the process for generating MC and Wh- questions and answers. For our research methods, we first performed text preprocessing with the CoNLL-2014 and BEA-2019 datasets, which consist of essays written by native and non-native English students. After that, we deployed GPT-2, BERT, and T5 in order to complete the question and answer generation task. The contextual question and answer generation model will contribute specifically to ESL teachers who manually create MC and Wh- questions for ESL students, as well as to the fields of education, digital humanities, and computer science. In addition, we share tutorials for this task with the public so that anyone can make use of our research

    Spoken language 'grammatical error correction'

    Get PDF
    Spoken language ‘grammatical error correction’ (GEC) is an important mechanism to help learners of a foreign language, here English, improve their spoken grammar. GEC is challeng- ing for non-native spoken language due to interruptions from disfluent speech events such as repetitions and false starts and issues in strictly defining what is acceptable in spoken language. Furthermore there is little labelled data to train models. One way to mitigate the impact of speech events is to use a disflu- ency detection (DD) model. Removing the detected disfluencies converts the speech transcript to be closer to written language, which has significantly more labelled training data. This paper considers two types of approaches to leveraging DD models to boost spoken GEC performance. One is sequential, a separately trained DD model acts as a pre-processing module providing a more structured input to the GEC model. The second approach is to train DD and GEC models in an end-to-end fashion, simul- taneously optimising both modules. Embeddings enable end- to-end models to have a richer information flow. Experimen- tal results show that DD effectively regulates GEC input; end- to-end training works well when fine-tuned on limited labelled in-domain data; and improving DD by incorporating acoustic information helps improve spoken GEC
    corecore