1,185 research outputs found
Toward a Discourse Theory for Annotating Causal Relations in Japanese
We present a revised discourse theory based on segmented discourse represen-tation theory and provide a method for building a Japanese corpus suitable for causal relation extraction. This extends and refines the framework proposed in Kaneko and Bekki (2014), and we evalu-ate our corpus and compare it with that work.
A discourse-based approach for Arabic question answering
The treatment of complex questions with explanatory answers involves searching for arguments in texts. Because of the prominent role that discourse relations play in reflecting text-producers’ intentions, capturing the underlying structure of text constitutes a good instructor in this issue. From our extensive review, a system for automatic discourse analysis that creates full rhetorical structures in large scale Arabic texts is currently unavailable. This is due to the high computational complexity involved in processing a large number of hypothesized relations associated with large texts. Therefore, more practical approaches should be investigated. This paper presents a new Arabic Text Parser oriented for question answering systems dealing with لماذا “why” and كيف “how to” questions. The Text Parser presented here considers the sentence as the basic unit of text and incorporates a set of heuristics to avoid computational explosion. With this approach, the developed question answering system reached a significant improvement over the baseline with a Recall of 68% and MRR of 0.62
Corpus-driven Semantics of Concession: Where do Expectations Come from?
Concession is one of the trickiest semantic discourse relations appearing in natural language. Many have tried to sub-categorize Concession and to define formal criteria to both distinguish its subtypes as well as for distinguishing Concession from the (similar) semantic relation of Contrast. But there is still a lack of consensus among the different proposals. In this paper, we focus on those approaches, e.g. (Lagerwerf 1998), (Winter & Rimon 1994), and (Korbayova & Webber 2007), assuming that Concession features two primary interpretations, "direct" and "indirect". We argue that this two way classification falls short of accounting for the full range of variants identified in naturally occurring data. Our investigation of one thousand Concession tokens in the Penn Discourse Treebank (PDTB) reveals that the interpretation of concessive relations varies according to the source of expectation. Four sources of expectation are identified. Each is characterized by a different relation holding between the eventuality that raises the expectation and the eventuality describing the expectation. We report a) a reliable inter-annotator agreement on the four types of sources identified in the PDTB data, b) a significant improvement on the annotation of previous disagreements on Concession-Contrast in the PDTB and c) a novel logical account of Concession using basic constructs from Hobbs' (1998) logic. Our proposal offers a uniform framework for the interpretation of Concession while accounting for the different sources of expectation by modifying a single predicate in the proposed formulae
The Penn Discourse Treebank 2.0 Annotation Manual
This report contains the guidelines for the annotation of discourse relations in the Penn Discourse Treebank (http://www.seas.upenn.edu/~pdtb), PDTB. Discourse relations in the PDTB are annotated in a bottom up fashion, and capture both lexically realized relations as well as implicit relations. Guidelines in this report are provided for all aspects of the annotation, including annotation explicit discourse connectives, implicit relations, arguments of relations, senses of relations, and the attribution of relations and their arguments. The report also provides descriptions of the annotation format representation
Reflections on the Penn Discourse TreeBank, Comparable Corpora and Complementary Annotation
The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology and psycholinguistics but also has spawned the annotation of comparable corpora in other languages and genres. Given this situation, this paper has four aims: (1) to provide a comprehensive introduction to the PDTB for those who are unfamiliar with it; (2) to correct some wrong (or perhaps inadvertent) assumptions about the PDTB and its annotation that may have weakened previous results or the performance of decision procedures induced from the data; (3) to explain variations seen in the annotation of comparable resources in other languages and genres, which should allow developers of future comparable resources to recognize whether the variations are relevant to them; and (4) to enumerate and explain relationships between PDTB annotation and comple-mentary annotation of other linguistic phenomena. The paper draws on work done by ourselves and others since the corpus was released
Recommended from our members
Problem-solving recognition in scientific text
As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method. Therefore, they play a significant role in the understanding of academic texts from the scientific domain. Capturing knowledge of such problem-solving utterances would provide a deep insight into text understanding. In this dissertation, I present the task of problem-solving recognition in scientific text.
To date, work on problem-solving recognition has received both theoretical and computational treatment. However, theories of problem-solving put forward by applied linguists lack practical adaptation to the domain of scientific text, and computational analyses have been narrow in scope.
This dissertation provides a new model of problem-solving. It is an adaptation of Hoey's (2001) model, tailored to the scientific domain. As far as modelling problems is concerned, I divided the text string expressing the statement of a problem into sub-components; this is one of my main contributions. I have mapped these sub-components to functional roles, and thus operationalised the model in such a way that it can be annotated by humans reliably. As far as the problem-solving relationship between problems and solutions is concerned, my model takes into account the local network of relationships existing between problems.
In order to validate this new model, a large-scale annotation study was conducted. The annotation study shows significant agreement amongst the annotators. The model is automated in two stages using a blend of classical machine learning and state-of-the-art deep learning methods. The first stage involves the implementation of problem and solution recognisers which operate at the sentence level. The second stage is more complex in that it recognises problems and solutions jointly at the token-level, and also establishes whether there is a problem-solving relationship between each of them. One of the best performers at this stage was a Neural Relational Topic Model. The results from automation show that the model is able to recognise problem-solving utterances in text to a high degree of accuracy.
My work has already shown a positive impact in both industry and academia. One start-up is currently using the model for representing academic articles, and a Japanese collaborator has received a grant to adapt my model to Japanese text
- …