2,886 research outputs found
Evaluating Centering for Information Ordering Using Corpora
In this article we discuss several metrics of coherence defined using centering theory and investigate the usefulness of such metrics for information ordering in automatic text generation. We estimate empirically which is the most promising metric and how useful this metric is using a general methodology applied on several corpora. Our main result is that the simplest metric (which relies exclusively on NOCB transitions) sets a robust baseline that cannot be outperformed by other metrics which make use of additional centering-based features. This baseline can be used for the development of both text-to-text and concept-to-text generation systems. </jats:p
Centering Theory in natural text: a large-scale corpus study
We present an extensive corpus study of Centering Theory (CT), examining how adequately CT models coherence in a large body of natural text. A novel analysis of transition bigrams provides strong empirical support for several CT-related linguistic claims which so far have been investigated only on various small data sets. The study also reveals genre-based differences in textsâ degrees of entity coherence. Previous work has shown unsupervised CT-based coherence metrics to be unable to outperform a simple baseline. We identify two reasons: 1) these metrics assume that some transition types are more coherent and that they occur more frequently than others, but in our corpus the latter is not the case; and 2) the original sentence order of a document and a random permutation of its sentences differ mostly in the fraction of entity-sharing sentence pairs, exactly the factor measured by the baseline
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Entity Coherence for Descriptive Text Structuring
Institute for Communicating and Collaborative SystemsAlthough entity coherence, i.e. the coherence that arises from certain patterns of references to
entities, is of attested importance for characterising a descriptive text structure, whether and how current formal models of entity coherence such as Centering Theory can be used for the purposes of natural language generation remains unclear. This thesis investigates this issue and sets out to explore which of the many formulations of Centering best suits text structuring. In doing this, we assume text
structuring to be a search task where different orderings of propositions are evaluated according to scores assigned by a metric.
The main question behind this study is how to choose a metric of entity coherence among many
alternatives as the only guidance to the text structuring component of a system that produces descriptions of objects. Different ways of defining metrics of entity coherence using Centeringâs notions are discussed and a general corpus-based methodology is introduced to identify which of these metrics constitute the most promising candidates for search-based text structuring before the actual generation
of the descriptive structure takes place.
The performance of a large set of metrics is estimated empirically in a series of computational
experiments using two kinds of data: (i) a reliably annotated corpus representing the genre of interest and (ii) data derived from an existing natural language generation system and ordered according to the instructions of a domain expert.
A final experiment supplements our main methodology by automatically evaluating the best scoring orderings of some of the best performing metrics in comparison to an upper bound defined by orderings produced by multiple experts on additional application-specific data and a lower bound defined by a random baseline.
The main findings are summarised as follows: In general, the simplest metric of entity coherence
constitutes a very robust baseline for both datasets. However, when the metrics are modified
according to an additional constraint on entity coherence, then the baseline is beaten in domain (ii).
The employed modification is supported by the subsidiary evaluation which renders all employed
metrics superior to the random baseline and helps identify the metric which overall constitutes the
most suitable candidate (among the ones investigated) for search-based descriptive text structuring in
domain (ii).
This thesis provides substantial insight into the role of entity coherence as a descriptive text structuring
constraint. Viewing Centering from an NLG perspective raises a series of interesting challenges
that the thesis identifies and attempts to investigate to a certain extent. The general evaluation methodology
and the results of the empirical studies are useful for any subsequent attempt to generate a descriptive
text structure in the context of an application that makes use of the notion of entity coherence
as modelled by Centering
Centering theory in natural text: a large-scale corpus study
We present an extensive corpus study of Centering Theory (CT), examining how adequately CT models coherence in a large body of natural text. A novel analysis of transition bigrams provides strong empirical support for several CT-related linguistic claims which so far have been investigated only on various small data sets. The
study also reveals genre-based differences in textsâ degrees of entity coherence. Previous work has shown unsupervised CTbased coherence metrics to be unable to outperform a simple baseline. We identify
two reasons: 1) these metrics assume that some transition types are more coherent and that they occur more frequently than others, but in our corpus the latter is not the case; and 2) the original sentence order of a document and a random permutation of its sentences differ mostly in the fraction of entity-sharing sentence pairs, exactly the
factor measured by the baseline
- âŠ