Search CORE

605 research outputs found

Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank

Author: Huang Liang
Zhao Kai
Publication venue
Publication date: 01/01/2017
Field of study

Discourse parsing has long been treated as a stand-alone problem independent from constituency or dependency parsing. Most attempts at this problem are pipelined rather than end-to-end, sophisticated, and not self-contained: they assume gold-standard text segmentations (Elementary Discourse Units), and use external parsers for syntactic features. In this paper we propose the first end-to-end discourse parser that jointly parses in both syntax and discourse levels, as well as the first syntacto-discourse treebank by integrating the Penn Treebank with the RST Treebank. Built upon our recent span-based constituency parser, this joint syntacto-discourse parser requires no preprocessing whatsoever (such as segmentation or feature extraction), achieves the state-of-the-art end-to-end discourse parsing accuracy.Comment: Accepted at EMNLP 201

arXiv.org e-Print Archive

Crossref

論述における談話構造および論理構造の解析

Author: Mim Farjana Sultana
Publication venue
Publication date: 26/09/2022
Field of study

Tohoku University博士（情報科学）thesi

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Recommended from our members

Neural approaches to discourse coherence: modeling, evaluation and application

Author: Farag Youmna
Publication venue: University of Cambridge
Publication date: 01/09/2020
Field of study

Discourse coherence is an important aspect of text quality that refers to the way different textual units relate to each other. In this thesis, I investigate neural approaches to modeling discourse coherence. I present a multi-task neural network where the main task is to predict a document-level coherence score and the secondary task is to learn word-level syntactic features. Additionally, I examine the effect of using contextualised word representations in single-task and multi-task setups. I evaluate my models on a synthetic dataset where incoherent documents are created by shuffling the sentence order in coherent original documents. The results show the efficacy of my multi-task learning approach, particularly when enhanced with contextualised embeddings, achieving new state-of-the-art results in ranking the coherent documents higher than the incoherent ones (96.9%). Furthermore, I apply my approach to the realistic domain of people’s everyday writing, such as emails and online posts, and further demonstrate its ability to capture various degrees of coherence. In order to further investigate the linguistic properties captured by coherence models, I create two datasets that exhibit syntactic and semantic alterations. Evaluating different models on these datasets reveals their ability to capture syntactic perturbations but their inadequacy to detect semantic changes. I find that semantic alterations are instead captured by models that first build sentence representations from averaged word embeddings, then apply a set of linear transformations over input sentence pairs. Finally, I present an application for coherence models in the pedagogical domain. I first demonstrate that state of-the-art neural approaches to automated essay scoring (AES) are not robust to adversarially created, grammatical, but incoherent sequences of sentences. Accordingly, I propose a framework for integrating and jointly training a coherence model with a state-of-the-art neural AES system in order to enhance its ability to detect such adversarial input. I show that this joint framework maintains a performance comparable to the state-of-the-art AES system in predicting a holistic essay score while significantly outperforming it in adversarial detection

Apollo (Cambridge)

Automatic text scoring using neural networks

Author: Alikaniotis D
Rei M
Yannakoudakis H
Publication venue: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Publication date: 01/01/2016
Field of study

Automated Text Scoring (ATS) provides a cost-effective and consistent alternative to human marking. However, in order to achieve good performance, the predictive features of the system need to be manually engineered by human experts. We introduce a model that forms word representations by learning the extent to which specific words contribute to the text’s score. Using Long-Short Term Memory networks to represent the meaning of texts, we demonstrate that a fully automated framework is able to achieve excellent results over similar approaches. In an attempt to make our results more interpretable, and inspired by recent advances in visualizing neural networks, we introduce a novel method for identifying the regions of the text that the model has found more discriminative.This is the accepted manuscript. It is currently embargoed pending publication

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

King's Research Portal

Rhetorical Structure Theory and coherence break identification

Author: Skoufaki Sophia
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 28/01/2020
Field of study

This article examines the claim of Rhetorical Structure Theory (RST) that violations of RST diagram formation principles indicate coherence breaks. In doing so, this article makes a significant contribution to the testing of RST. More broadly, it indicates that examining the coherence-break identification potential of coherence theories could help specify each theory’s purview and, in the long term, lead to the creation of hybrid models of coherence. Moreover, it paves the way for the development of training resources on discourse (in)coherence for language teachers, exam markers and language learners. 84 paragraphs written by Taiwanese learners of English were analysed according to RST and coherence measures were calculated on the basis of this analysis. The results suggest that the violation of any diagram-formation principle indicates coherence breaks, thus corroborating this RST claim. Inter- and intrajudge agreement in terms of both RST coding and coherence measures calculated on the basis of coherence breaks are reported and discussed. The kinds of coherence breaks which are and are not located by RST analysis are discussed and exemplified. The paper concludes with a discussion of implications for pedagogy and future research

University of Essex Research Repository

DOMAIN ADAPTATION FOR AUTOMATED ESSAY SCORING

Author: PETER PHANDI
Publication venue
Publication date: 27/07/2016
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

A robust methodology for automated essay grading

Author: Fazal Anhar
Publication venue: Curtin University
Publication date: 01/01/2013
Field of study

None of the available automated essay grading systems can be used to grade essays according to the National Assessment Program – Literacy and Numeracy (NAPLAN) analytic scoring rubric used in Australia. This thesis is a humble effort to address this limitation. The objective of this thesis is to develop a robust methodology for automatically grading essays based on the NAPLAN rubric by using heuristics and rules based on English language and neural network modelling

espace@Curtin

Using data mining to repurpose German language corpora. An evaluation of data-driven analysis methods for corpus linguistics

Author: Frey Jennifer Carmen <1988>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 03/04/2020
Field of study

A growing number of studies report interesting insights gained from existing data resources. Among those, there are analyses on textual data, giving reason to consider such methods for linguistics as well. However, the field of corpus linguistics usually works with purposefully collected, representative language samples that aim to answer only a limited set of research questions. This thesis aims to shed some light on the potentials of data-driven analysis based on machine learning and predictive modelling for corpus linguistic studies, investigating the possibility to repurpose existing German language corpora for linguistic inquiry by using methodologies developed for data science and computational linguistics. The study focuses on predictive modelling and machine-learning-based data mining and gives a detailed overview and evaluation of currently popular strategies and methods for analysing corpora with computational methods. After the thesis introduces strategies and methods that have already been used on language data, discusses how they can assist corpus linguistic analysis and refers to available toolkits and software as well as to state-of-the-art research and further references, the introduced methodological toolset is applied in two differently shaped corpus studies that utilize readily available corpora for German. The first study explores linguistic correlates of holistic text quality ratings on student essays, while the second deals with age-related language features in computer-mediated communication and interprets age prediction models to answer a set of research questions that are based on previous research in the field. While both studies give linguistic insights that integrate into the current understanding of the investigated phenomena in German language, they systematically test the methodological toolset introduced beforehand, allowing a detailed discussion of added values and remaining challenges of machine-learning-based data mining methods in corpus at the end of the thesis

AMS Tesi di Dottorato

DeepEval: An Integrated Framework for the Evaluation of Student Responses in Dialogue Based Intelligent Tutoring Systems

Author: Banjade Rajendra
Publication venue: University of Memphis Digital Commons
Publication date: 02/12/2014
Field of study

The automatic assessment of student answers is one of the critical components of an Intelligent Tutoring System (ITS) because accurate assessment of student input is needed in order to provide effective feedback that leads to learning. But this is a very challenging task because it requires natural language understanding capabilities. The process requires various components, concepts identification, co-reference resolution, ellipsis handling etc. As part of this thesis, we thoroughly analyzed a set of student responses obtained from an experiment with the intelligent tutoring system DeepTutor in which college students interacted with the tutor to solve conceptual physics problems, designed an automatic answer assessment framework (DeepEval), and evaluated the framework after implementing several important components. To evaluate our system, we annotated 618 responses from 41 students for correctness. Our system performs better as compared to the typical similarity calculation method. We also discuss various issues in automatic answer evaluation

University of Memphis Digital Commons