1,965 research outputs found
Automatic Identification of AltLexes using Monolingual Parallel Corpora
The automatic identification of discourse relations is still a challenging
task in natural language processing. Discourse connectives, such as "since" or
"but", are the most informative cues to identify explicit relations; however
discourse parsers typically use a closed inventory of such connectives. As a
result, discourse relations signaled by markers outside these inventories (i.e.
AltLexes) are not detected as effectively. In this paper, we propose a novel
method to leverage parallel corpora in text simplification and lexical
resources to automatically identify alternative lexicalizations that signal
discourse relation. When applied to the Simple Wikipedia and Newsela corpora
along with WordNet and the PPDB, the method allowed the automatic discovery of
91 AltLexes.Comment: 6 pages, Proceedings of Recent Advances in Natural Language
Processing (RANLP 2017
All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch
Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, though NLP-inspired research has focused on adding more complex readability features there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervised machine learning. Based on readability assessments by both experts
and a crowd, we implement different types of text characteristics ranging from easy-to-compute superficial text characteristics to features requiring a deep linguistic processing, resulting in ten
different feature groups. Both a regression and classification setup are investigated reflecting the two possible readability prediction tasks: scoring individual texts or comparing two texts. We show that going beyond correlation calculations for readability optimization using a wrapper-based genetic algorithm optimization approach is a promising task which provides considerable insights in which feature combinations contribute to the overall readability prediction. Since we also have gold standard information available for those features requiring deep processing we are able to investigate the true upper bound of our Dutch system. Interestingly, we will observe that the performance of our fully-automatic readability prediction pipeline is on par with the pipeline using golden deep syntactic and semantic information
Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification
Implicit discourse relation classification is of great challenge due to the
lack of connectives as strong linguistic cues, which motivates the use of
annotated implicit connectives to improve the recognition. We propose a feature
imitation framework in which an implicit relation network is driven to learn
from another neural network with access to connectives, and thus encouraged to
extract similarly salient features for accurate classification. We develop an
adversarial model to enable an adaptive imitation scheme through competition
between the implicit network and a rival feature discriminator. Our method
effectively transfers discriminability of connectives to the implicit features,
and achieves state-of-the-art performance on the PDTB benchmark.Comment: To appear in ACL201
Discourse Structure in Machine Translation Evaluation
In this article, we explore the potential of using sentence-level discourse
structure for machine translation evaluation. We first design discourse-aware
similarity measures, which use all-subtree kernels to compare discourse parse
trees in accordance with the Rhetorical Structure Theory (RST). Then, we show
that a simple linear combination with these measures can help improve various
existing machine translation evaluation metrics regarding correlation with
human judgments both at the segment- and at the system-level. This suggests
that discourse information is complementary to the information used by many of
the existing evaluation metrics, and thus it could be taken into account when
developing richer evaluation metrics, such as the WMT-14 winning combined
metric DiscoTKparty. We also provide a detailed analysis of the relevance of
various discourse elements and relations from the RST parse trees for machine
translation evaluation. In particular we show that: (i) all aspects of the RST
tree are relevant, (ii) nuclearity is more useful than relation type, and (iii)
the similarity of the translation RST tree to the reference tree is positively
correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse
analysis. Computational Linguistics, 201
A PDTB-Styled End-to-End Discourse Parser
We have developed a full discourse parser in the Penn Discourse Treebank
(PDTB) style. Our trained parser first identifies all discourse and
non-discourse relations, locates and labels their arguments, and then
classifies their relation types. When appropriate, the attribution spans to
these relations are also determined. We present a comprehensive evaluation from
both component-wise and error-cascading perspectives.Comment: 15 pages, 5 figures, 7 table
GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection
In this paper we present GumDrop, Georgetown University's entry at the DISRPT
2019 Shared Task on automatic discourse unit segmentation and connective
detection. Our approach relies on model stacking, creating a heterogeneous
ensemble of classifiers, which feed into a metalearner for each final task. The
system encompasses three trainable component stacks: one for sentence
splitting, one for discourse unit segmentation and one for connective
detection. The flexibility of each ensemble allows the system to generalize
well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking
(DISRPT2019
- …