682 research outputs found
Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank
Discourse parsing has long been treated as a stand-alone problem independent
from constituency or dependency parsing. Most attempts at this problem are
pipelined rather than end-to-end, sophisticated, and not self-contained: they
assume gold-standard text segmentations (Elementary Discourse Units), and use
external parsers for syntactic features. In this paper we propose the first
end-to-end discourse parser that jointly parses in both syntax and discourse
levels, as well as the first syntacto-discourse treebank by integrating the
Penn Treebank with the RST Treebank. Built upon our recent span-based
constituency parser, this joint syntacto-discourse parser requires no
preprocessing whatsoever (such as segmentation or feature extraction), achieves
the state-of-the-art end-to-end discourse parsing accuracy.Comment: Accepted at EMNLP 201
Learning Recursive Segments for Discourse Parsing
Automatically detecting discourse segments is an important preliminary step
towards full discourse parsing. Previous research on discourse segmentation
have relied on the assumption that elementary discourse units (EDUs) in a
document always form a linear sequence (i.e., they can never be nested).
Unfortunately, this assumption turns out to be too strong, for some theories of
discourse like SDRT allows for nested discourse units. In this paper, we
present a simple approach to discourse segmentation that is able to produce
nested EDUs. Our approach builds on standard multi-class classification
techniques combined with a simple repairing heuristic that enforces global
coherence. Our system was developed and evaluated on the first round of
annotations provided by the French Annodis project (an ongoing effort to create
a discourse bank for French). Cross-validated on only 47 documents (1,445
EDUs), our system achieves encouraging performance results with an F-score of
73% for finding EDUs.Comment: published at LREC 201
Resolving anaphoric references on deficient syntactic descriptions
Syntactic coindexing restrictions are by now known to be of central importance to practical anaphor resolution approaches. Since, in particular due to structural ambiguity, the assumption of the availability of a unique syntactic reading proves to be unrealistic, robust anaphor resolution relies on techniques to overcome this deficiency. In this paper, two approaches are presented which generalize the verification of coindexing constraints to de cient descriptions. At first, a partly heuristic method is described, which has been implemented. Secondly, a provable complete method is specified. It provides the means to exploit the results of anaphor resolution for a further structural disambiguation. By rendering possible a parallel processing model, this method exhibits, in a general sense, a higher degree of robustness. As a practically optimal solution, a combination of the two approaches is suggested
Towards unification of discourse annotation frameworks
Funding: The author is funded by University of St Andrews-China Scholarship Council joint scholarship (NO.202008300012).Discourse information is difficult to represent and annotate. Among the major frameworks for annotating discourse information, RST, PDTB and SDRT are widely discussed and used, each having its own theoretical foundation and focus. Corpora annotated under different frameworks vary considerably. To make better use of the existing discourse corpora and achieve the possible synergy of different frameworks, it is worthwhile to investigate the systematic relations between different frameworks and devise methods of unifying the frameworks. Although the issue of framework unification has been a topic of discussion for a long time, there is currently no comprehensive approach which considers unifying both discourse structure and discourse relations and evaluates the unified framework intrinsically and extrinsically. We plan to use automatic means for the unification task and evaluate the result with structural complexity and downstream tasks. We will also explore the application of the unified framework in multi-task learning and graphical models.Publisher PD
Parsing Argumentation Structures in Persuasive Essays
In this article, we present a novel approach for parsing argumentation
structures. We identify argument components using sequence labeling at the
token level and apply a new joint model for detecting argumentation structures.
The proposed model globally optimizes argument component types and
argumentative relations using integer linear programming. We show that our
model considerably improves the performance of base classifiers and
significantly outperforms challenging heuristic baselines. Moreover, we
introduce a novel corpus of persuasive essays annotated with argumentation
structures. We show that our annotation scheme and annotation guidelines
successfully guide human annotators to substantial agreement. This corpus and
the annotation guidelines are freely available for ensuring reproducibility and
to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26
October 2015. Revised submission: 15 July 201
- …