182 research outputs found
Discourse Structure in Machine Translation Evaluation
In this article, we explore the potential of using sentence-level discourse
structure for machine translation evaluation. We first design discourse-aware
similarity measures, which use all-subtree kernels to compare discourse parse
trees in accordance with the Rhetorical Structure Theory (RST). Then, we show
that a simple linear combination with these measures can help improve various
existing machine translation evaluation metrics regarding correlation with
human judgments both at the segment- and at the system-level. This suggests
that discourse information is complementary to the information used by many of
the existing evaluation metrics, and thus it could be taken into account when
developing richer evaluation metrics, such as the WMT-14 winning combined
metric DiscoTKparty. We also provide a detailed analysis of the relevance of
various discourse elements and relations from the RST parse trees for machine
translation evaluation. In particular we show that: (i) all aspects of the RST
tree are relevant, (ii) nuclearity is more useful than relation type, and (iii)
the similarity of the translation RST tree to the reference tree is positively
correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse
analysis. Computational Linguistics, 201
Summarizing Dialogic Arguments from Social Media
Online argumentative dialog is a rich source of information on popular
beliefs and opinions that could be useful to companies as well as governmental
or public policy agencies. Compact, easy to read, summaries of these dialogues
would thus be highly valuable. A priori, it is not even clear what form such a
summary should take. Previous work on summarization has primarily focused on
summarizing written texts, where the notion of an abstract of the text is well
defined. We collect gold standard training data consisting of five human
summaries for each of 161 dialogues on the topics of Gay Marriage, Gun Control
and Abortion. We present several different computational models aimed at
identifying segments of the dialogues whose content should be used for the
summary, using linguistic features and Word2vec features with both SVMs and
Bidirectional LSTMs. We show that we can identify the most important arguments
by using the dialog context with a best F-measure of 0.74 for gun control, 0.71
for gay marriage, and 0.67 for abortion.Comment: Proceedings of the 21th Workshop on the Semantics and Pragmatics of
Dialogue (SemDial 2017
Theory and Applications for Advanced Text Mining
Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
- …