1,182 research outputs found

    Toward a Discourse Theory for Annotating Causal Relations in Japanese

    Get PDF
    We present a revised discourse theory based on segmented discourse represen-tation theory and provide a method for building a Japanese corpus suitable for causal relation extraction. This extends and refines the framework proposed in Kaneko and Bekki (2014), and we evalu-ate our corpus and compare it with that work.

    A discourse-based approach for Arabic question answering

    Get PDF
    The treatment of complex questions with explanatory answers involves searching for arguments in texts. Because of the prominent role that discourse relations play in reflecting text-producers’ intentions, capturing the underlying structure of text constitutes a good instructor in this issue. From our extensive review, a system for automatic discourse analysis that creates full rhetorical structures in large scale Arabic texts is currently unavailable. This is due to the high computational complexity involved in processing a large number of hypothesized relations associated with large texts. Therefore, more practical approaches should be investigated. This paper presents a new Arabic Text Parser oriented for question answering systems dealing with لماذا “why” and كيف “how to” questions. The Text Parser presented here considers the sentence as the basic unit of text and incorporates a set of heuristics to avoid computational explosion. With this approach, the developed question answering system reached a significant improvement over the baseline with a Recall of 68% and MRR of 0.62

    Corpus-driven Semantics of Concession: Where do Expectations Come from?

    Get PDF
                                                                                                  Concession is one of the trickiest semantic discourse relations appearing in natural language. Many have tried to sub-categorize Concession and to define formal criteria to both distinguish its subtypes as well as for distinguishing Concession from the (similar) semantic relation of Contrast. But there is still a lack of consensus among the different proposals. In this paper, we focus on those approaches, e.g. (Lagerwerf 1998), (Winter & Rimon 1994), and (Korbayova & Webber 2007), assuming that Concession features two primary interpretations, "direct" and "indirect". We argue that this two way classification falls short of accounting for the full range of variants identified in naturally occurring data. Our investigation of one thousand Concession tokens in the Penn Discourse Treebank (PDTB) reveals that the interpretation of concessive relations varies according to the source of expectation. Four sources of expectation are identified. Each is characterized by a different relation holding between the eventuality that raises the expectation and the eventuality describing the expectation. We report a) a reliable inter-annotator agreement on the four types of sources identified in the PDTB data, b) a significant improvement on the annotation of previous disagreements on Concession-Contrast in the PDTB and c) a novel logical account of Concession using basic constructs from Hobbs' (1998) logic. Our proposal offers a uniform framework for the interpretation of Concession while accounting for the different sources of expectation by modifying a single predicate in the proposed formulae

    The Penn Discourse Treebank 2.0 Annotation Manual

    Get PDF
    This report contains the guidelines for the annotation of discourse relations in the Penn Discourse Treebank (http://www.seas.upenn.edu/~pdtb), PDTB. Discourse relations in the PDTB are annotated in a bottom up fashion, and capture both lexically realized relations as well as implicit relations. Guidelines in this report are provided for all aspects of the annotation, including annotation explicit discourse connectives, implicit relations, arguments of relations, senses of relations, and the attribution of relations and their arguments. The report also provides descriptions of the annotation format representation

    Reflections on the Penn Discourse TreeBank, Comparable Corpora and Complementary Annotation

    Get PDF
    The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology and psycholinguistics but also has spawned the annotation of comparable corpora in other languages and genres. Given this situation, this paper has four aims: (1) to provide a comprehensive introduction to the PDTB for those who are unfamiliar with it; (2) to correct some wrong (or perhaps inadvertent) assumptions about the PDTB and its annotation that may have weakened previous results or the performance of decision procedures induced from the data; (3) to explain variations seen in the annotation of comparable resources in other languages and genres, which should allow developers of future comparable resources to recognize whether the variations are relevant to them; and (4) to enumerate and explain relationships between PDTB annotation and comple-mentary annotation of other linguistic phenomena. The paper draws on work done by ourselves and others since the corpus was released
    corecore