8,582 research outputs found
Learning Recursive Segments for Discourse Parsing
Automatically detecting discourse segments is an important preliminary step
towards full discourse parsing. Previous research on discourse segmentation
have relied on the assumption that elementary discourse units (EDUs) in a
document always form a linear sequence (i.e., they can never be nested).
Unfortunately, this assumption turns out to be too strong, for some theories of
discourse like SDRT allows for nested discourse units. In this paper, we
present a simple approach to discourse segmentation that is able to produce
nested EDUs. Our approach builds on standard multi-class classification
techniques combined with a simple repairing heuristic that enforces global
coherence. Our system was developed and evaluated on the first round of
annotations provided by the French Annodis project (an ongoing effort to create
a discourse bank for French). Cross-validated on only 47 documents (1,445
EDUs), our system achieves encouraging performance results with an F-score of
73% for finding EDUs.Comment: published at LREC 201
Neural Discourse Structure for Text Categorization
We show that discourse structure, as defined by Rhetorical Structure Theory
and provided by an existing discourse parser, benefits text categorization. Our
approach uses a recursive neural network and a newly proposed attention
mechanism to compute a representation of the text that focuses on salient
content, from the perspective of both RST and the task. Experiments consider
variants of the approach and illustrate its strengths and weaknesses.Comment: ACL 2017 camera ready versio
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims
Biological knowledge is increasingly represented as a collection of (entity-relationship-entity) triplets. These are queried, mined, appended to papers, and published. However, this representation ignores the argumentation contained within a paper and the relationships between hypotheses, claims and evidence put forth in the article. In this paper, we propose an alternate view of the research article as a network of 'hypotheses and evidence'. Our knowledge representation focuses on scientific discourse as a rhetorical activity, which leads to a different direction in the development of tools and processes for modeling this discourse. We propose to extract knowledge from the article to allow the construction of a system where a specific scientific claim is connected, through trails of meaningful relationships, to experimental evidence. We discuss some current efforts and future plans in this area
GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection
In this paper we present GumDrop, Georgetown University's entry at the DISRPT
2019 Shared Task on automatic discourse unit segmentation and connective
detection. Our approach relies on model stacking, creating a heterogeneous
ensemble of classifiers, which feed into a metalearner for each final task. The
system encompasses three trainable component stacks: one for sentence
splitting, one for discourse unit segmentation and one for connective
detection. The flexibility of each ensemble allows the system to generalize
well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking
(DISRPT2019
Cross-lingual and cross-domain discourse segmentation of entire documents
Discourse segmentation is a crucial step in building end-to-end discourse
parsers. However, discourse segmenters only exist for a few languages and
domains. Typically they only detect intra-sentential segment boundaries,
assuming gold standard sentence and token segmentation, and relying on
high-quality syntactic parses and rich heuristics that are not generally
available across languages and domains. In this paper, we propose statistical
discourse segmenters for five languages and three domains that do not rely on
gold pre-annotations. We also consider the problem of learning discourse
segmenters when no labeled data is available for a language. Our fully
supervised system obtains 89.5% F1 for English newswire, with slight drops in
performance on other domains, and we report supervised and unsupervised
(cross-lingual) results for five languages in total.Comment: To appear in Proceedings of ACL 201
Parsing Argumentation Structures in Persuasive Essays
In this article, we present a novel approach for parsing argumentation
structures. We identify argument components using sequence labeling at the
token level and apply a new joint model for detecting argumentation structures.
The proposed model globally optimizes argument component types and
argumentative relations using integer linear programming. We show that our
model considerably improves the performance of base classifiers and
significantly outperforms challenging heuristic baselines. Moreover, we
introduce a novel corpus of persuasive essays annotated with argumentation
structures. We show that our annotation scheme and annotation guidelines
successfully guide human annotators to substantial agreement. This corpus and
the annotation guidelines are freely available for ensuring reproducibility and
to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26
October 2015. Revised submission: 15 July 201
- âŠ