18 research outputs found
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
A Deep Sequential Model for Discourse Parsing on Multi-Party Dialogues
Discourse structures are beneficial for various NLP tasks such as dialogue
understanding, question answering, sentiment analysis, and so on. This paper
presents a deep sequential model for parsing discourse dependency structures of
multi-party dialogues. The proposed model aims to construct a discourse
dependency tree by predicting dependency relations and constructing the
discourse structure jointly and alternately. It makes a sequential scan of the
Elementary Discourse Units (EDUs) in a dialogue. For each EDU, the model
decides to which previous EDU the current one should link and what the
corresponding relation type is. The predicted link and relation type are then
used to build the discourse structure incrementally with a structured encoder.
During link prediction and relation classification, the model utilizes not only
local information that represents the concerned EDUs, but also global
information that encodes the EDU sequence and the discourse structure that is
already built at the current step. Experiments show that the proposed model
outperforms all the state-of-the-art baselines.Comment: Accepted to AAAI 201
GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection
In this paper we present GumDrop, Georgetown University's entry at the DISRPT
2019 Shared Task on automatic discourse unit segmentation and connective
detection. Our approach relies on model stacking, creating a heterogeneous
ensemble of classifiers, which feed into a metalearner for each final task. The
system encompasses three trainable component stacks: one for sentence
splitting, one for discourse unit segmentation and one for connective
detection. The flexibility of each ensemble allows the system to generalize
well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking
(DISRPT2019
RST-style Discourse Parsing Guided by Document-level Content Structures
Rhetorical Structure Theory based Discourse Parsing (RST-DP) explores how
clauses, sentences, and large text spans compose a whole discourse and presents
the rhetorical structure as a hierarchical tree. Existing RST parsing pipelines
construct rhetorical structures without the knowledge of document-level content
structures, which causes relatively low performance when predicting the
discourse relations for large text spans. Recognizing the value of high-level
content-related information in facilitating discourse relation recognition, we
propose a novel pipeline for RST-DP that incorporates structure-aware news
content sentence representations derived from the task of News Discourse
Profiling. By incorporating only a few additional layers, this enhanced
pipeline exhibits promising performance across various RST parsing metrics
Predicting Discourse Structure using Distant Supervision from Sentiment
Discourse parsing could not yet take full advantage of the neural NLP
revolution, mostly due to the lack of annotated datasets. We propose a novel
approach that uses distant supervision on an auxiliary task (sentiment
classification), to generate abundant data for RST-style discourse structure
prediction. Our approach combines a neural variant of multiple-instance
learning, using document-level supervision, with an optimal CKY-style tree
generation algorithm. In a series of experiments, we train a discourse parser
(for only structure prediction) on our automatically generated dataset and
compare it with parsers trained on human-annotated corpora (news domain RST-DT
and Instructional domain). Results indicate that while our parser does not yet
match the performance of a parser trained and tested on the same dataset
(intra-domain), it does perform remarkably well on the much more difficult and
arguably more useful task of inter-domain discourse structure prediction, where
the parser is trained on one domain and tested/applied on another one.Comment: Accepted to EMNLP 2019, 9 page