Search CORE

875 research outputs found

Coreference in dialogue

Author: Fritz Gerd
Publication venue
Publication date: 10/11/2009
Field of study

Since the early days of discourse analysis coreference has always been considered a major factor in the formation of texts and dialogues. The repetition of nominal elements and the anaphoric use of pronouns in successive sentences is a fundamental cohesive pattern which ties sentences together and contributes to the coherence of sequences. "La coherence transphrastique trouve dans la pronominalisation un des procedes les plus efficaces" (Stati 1990, 160). The basic structural pattern on which linguists focused their interest in the early 1970s is captured by the following examples: (1) A man entered the house. After closing the door, the man sat down. He was tired. (2) Peter The man entered the house. He was tired. He ..

Hochschulschriftenserver - Universität Frankfurt am Main

All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch

Author: De Clercq Orphée
Hoste Veronique
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2016
Field of study

Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, though NLP-inspired research has focused on adding more complex readability features there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervised machine learning. Based on readability assessments by both experts and a crowd, we implement different types of text characteristics ranging from easy-to-compute superficial text characteristics to features requiring a deep linguistic processing, resulting in ten different feature groups. Both a regression and classification setup are investigated reflecting the two possible readability prediction tasks: scoring individual texts or comparing two texts. We show that going beyond correlation calculations for readability optimization using a wrapper-based genetic algorithm optimization approach is a promising task which provides considerable insights in which feature combinations contribute to the overall readability prediction. Since we also have gold standard information available for those features requiring deep processing we are able to investigate the true upper bound of our Dutch system. Interestingly, we will observe that the performance of our fully-automatic readability prediction pipeline is on par with the pipeline using golden deep syntactic and semantic information

Crossref

Ghent University Academic Bibliography

Centering Theory in natural text: a large-scale corpus study

Author: Friedrich Annemarie
Palmer Alexis
Publication venue
Publication date: 23/10/2014
Field of study

We present an extensive corpus study of Centering Theory (CT), examining how adequately CT models coherence in a large body of natural text. A novel analysis of transition bigrams provides strong empirical support for several CT-related linguistic claims which so far have been investigated only on various small data sets. The study also reveals genre-based differences in texts’ degrees of entity coherence. Previous work has shown unsupervised CT-based coherence metrics to be unable to outperform a simple baseline. We identify two reasons: 1) these metrics assume that some transition types are more coherent and that they occur more frequently than others, but in our corpus the latter is not the case; and 2) the original sentence order of a document and a random permutation of its sentences differ mostly in the fraction of entity-sharing sentence pairs, exactly the factor measured by the baseline

CiteSeerX

University of Hildesheim

Dynamic Entity Representations in Neural Language Models

Author: Choi Yejin
Ji Yangfeng
Martschat Sebastian
Smith Noah A.
Tan Chenhao
Publication venue
Publication date: 01/01/2017
Field of study

Understanding a long document requires tracking how entities are introduced and evolve over time. We present a new type of language model, EntityNLM, that can explicitly model entities, dynamically update their representations, and contextually generate their mentions. Our model is generative and flexible; it can model an arbitrary number of entities in context while generating each entity mention at an arbitrary length. In addition, it can be used for several different tasks such as language modeling, coreference resolution, and entity prediction. Experimental results with all these tasks demonstrate that our model consistently outperforms strong baselines and prior work.Comment: EMNLP 2017 camera-ready versio

arXiv.org e-Print Archive

Crossref

Text Coherence Analysis Based on Deep Neural Network

Author: Cui Baiyun
Li Yingming
Zhang Yaqing
Zhang Zhongfei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/10/2017
Field of study

In this paper, we propose a novel deep coherence model (DCM) using a convolutional neural network architecture to capture the text coherence. The text coherence problem is investigated with a new perspective of learning sentence distributional representation and text coherence modeling simultaneously. In particular, the model captures the interactions between sentences by computing the similarities of their distributional representations. Further, it can be easily trained in an end-to-end fashion. The proposed model is evaluated on a standard Sentence Ordering task. The experimental results demonstrate its effectiveness and promise in coherence assessment showing a significant improvement over the state-of-the-art by a wide margin.Comment: 4 pages, 2 figures, CIKM 201

arXiv.org e-Print Archive

Crossref