11,320 research outputs found
Recommended from our members
Neural approaches to discourse coherence: modeling, evaluation and application
Discourse coherence is an important aspect of text quality that refers to the way different textual units relate to each other. In this thesis, I investigate neural approaches to modeling discourse coherence. I present a multi-task neural network where the main task is to predict a document-level coherence score and the secondary task is to learn word-level syntactic features. Additionally, I examine the effect of using contextualised word representations in single-task and multi-task setups. I evaluate my models on a synthetic dataset where incoherent documents are created by shuffling the sentence order in coherent original documents. The results show the efficacy of my multi-task learning approach, particularly when enhanced with contextualised embeddings, achieving new state-of-the-art results in ranking the coherent documents higher than the incoherent ones (96.9%). Furthermore, I apply my approach to the realistic domain of people’s everyday writing, such as emails and online posts, and further demonstrate its ability to capture various degrees of coherence. In order to further investigate the linguistic properties captured by coherence models, I create two datasets that exhibit syntactic and semantic alterations. Evaluating different models on these datasets reveals their ability to capture syntactic perturbations but their inadequacy to detect semantic changes. I find that semantic alterations are instead captured by models that first build sentence representations from averaged word embeddings, then apply a set of linear transformations over input sentence pairs. Finally, I present an application for coherence models in the pedagogical domain. I first demonstrate that state of-the-art neural approaches to automated essay scoring (AES) are not robust to adversarially created, grammatical, but incoherent sequences of sentences. Accordingly, I propose a framework for integrating and jointly training a coherence model with a state-of-the-art neural AES system in order to enhance its ability to detect such adversarial input. I show that this joint framework maintains a performance comparable to the state-of-the-art AES system in predicting a holistic essay score while significantly outperforming it in adversarial detection
Recommended from our members
Using student experience as a model for designing an automatic feedback system for short essays
The SAFeSEA project (Supportive Automated Feedback for Short Essay Answers) aims to develop an automated feedback system to support university students as they write summative essays. Empirical studies carried out in the initial phase of the system’s development illuminated students’ approaches to and understandings of the essay-writing process. Findings from these studies suggested that, regardless of their experience of higher education, students consider essay-writing as: 1) a sequential set of activities, 2) a process that is enhanced through particular sources of support and 3) a skill that requires the development of personal strategies. Further data collected from tutors offered insight into the feedback and reflection stages of essay-writing. These perspectives offered a fundamental model of essay-writing and feedback to inform the ongoing, iterative development of this automated feedback system and indeed, for any institution developing tools to support students’ writing
Do We Need Neural Models to Explain Human Judgments of Acceptability?
Native speakers can judge whether a sentence is an acceptable instance of
their language. Acceptability provides a means of evaluating whether
computational language models are processing language in a human-like manner.
We test the ability of computational language models, simple language features,
and word embeddings to predict native English speakers judgments of
acceptability on English-language essays written by non-native speakers. We
find that much of the sentence acceptability variance can be captured by a
combination of features including misspellings, word order, and word similarity
(Pearson's r = 0.494). While predictive neural models fit acceptability
judgments well (r = 0.527), we find that a 4-gram model with statistical
smoothing is just as good (r = 0.528). Thanks to incorporating a count of
misspellings, our 4-gram model surpasses both the previous unsupervised
state-of-the art (Lau et al., 2015; r = 0.472), and the average non-expert
native speaker (r = 0.46). Our results demonstrate that acceptability is well
captured by n-gram statistics and simple language features.Comment: 10 pages (8 pages + 2 pages of references), 1 figure, 7 table
Machine translation evaluation resources and methods: a survey
We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT.
This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content
Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring
Automated essay scoring (AES) aims to score essays written for a given
prompt, which defines the writing topic. Most existing AES systems assume to
grade essays of the same prompt as used in training and assign only a holistic
score. However, such settings conflict with real-education situations;
pre-graded essays for a particular prompt are lacking, and detailed trait
scores of sub-rubrics are required. Thus, predicting various trait scores of
unseen-prompt essays (called cross-prompt essay trait scoring) is a remaining
challenge of AES. In this paper, we propose a robust model: prompt- and trait
relation-aware cross-prompt essay trait scorer. We encode prompt-aware essay
representation by essay-prompt attention and utilizing the topic-coherence
feature extracted by the topic-modeling mechanism without access to labeled
data; therefore, our model considers the prompt adherence of an essay, even in
a cross-prompt setting. To facilitate multi-trait scoring, we design
trait-similarity loss that encapsulates the correlations of traits. Experiments
prove the efficacy of our model, showing state-of-the-art results for all
prompts and traits. Significant improvements in low-resource-prompt and
inferior traits further indicate our model's strength.Comment: Accepted at ACL 2023 (Findings, long paper
- …