399 research outputs found
Better Document-level Sentiment Analysis from RST Discourse Parsing
Discourse structure is the hidden link between surface features and
document-level properties, such as sentiment polarity. We show that the
discourse analyses produced by Rhetorical Structure Theory (RST) parsers can
improve document-level sentiment analysis, via composition of local information
up the discourse tree. First, we show that reweighting discourse units
according to their position in a dependency representation of the rhetorical
structure can yield substantial improvements on lexicon-based sentiment
analysis. Next, we present a recursive neural network over the RST structure,
which offers significant improvements over classification-based methods.Comment: Published at Empirical Methods in Natural Language Processing (EMNLP
2015
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
Generating indicative-informative summaries with SumUM
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies
Discourse Structure in Machine Translation Evaluation
In this article, we explore the potential of using sentence-level discourse
structure for machine translation evaluation. We first design discourse-aware
similarity measures, which use all-subtree kernels to compare discourse parse
trees in accordance with the Rhetorical Structure Theory (RST). Then, we show
that a simple linear combination with these measures can help improve various
existing machine translation evaluation metrics regarding correlation with
human judgments both at the segment- and at the system-level. This suggests
that discourse information is complementary to the information used by many of
the existing evaluation metrics, and thus it could be taken into account when
developing richer evaluation metrics, such as the WMT-14 winning combined
metric DiscoTKparty. We also provide a detailed analysis of the relevance of
various discourse elements and relations from the RST parse trees for machine
translation evaluation. In particular we show that: (i) all aspects of the RST
tree are relevant, (ii) nuclearity is more useful than relation type, and (iii)
the similarity of the translation RST tree to the reference tree is positively
correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse
analysis. Computational Linguistics, 201
ALens: An Adaptive Domain-Oriented Abstract Writing Training Tool for Novice Researchers
The significance of novice researchers acquiring proficiency in writing
abstracts has been extensively documented in the field of higher education,
where they often encounter challenges in this process. Traditionally, students
have been advised to enroll in writing training courses as a means to develop
their abstract writing skills. Nevertheless, this approach frequently falls
short in providing students with personalized and adaptable feedback on their
abstract writing. To address this gap, we initially conducted a formative study
to ascertain the user requirements for an abstract writing training tool.
Subsequently, we proposed a domain-specific abstract writing training tool
called ALens, which employs rhetorical structure parsing to identify key
concepts, evaluates abstract drafts based on linguistic features, and employs
visualization techniques to analyze the writing patterns of exemplary
abstracts. A comparative user study involving an alternative abstract writing
training tool has been conducted to demonstrate the efficacy of our approach.Comment: Accepted by HHME/CHCI 202
Sentence Centrality Revisited for Unsupervised Summarization
Single document summarization has enjoyed renewed interests in recent years
thanks to the popularity of neural network models and the availability of
large-scale datasets. In this paper we develop an unsupervised approach arguing
that it is unrealistic to expect large-scale and high-quality training data to
be available or created for different types of summaries, domains, or
languages. We revisit a popular graph-based ranking algorithm and modify how
node (aka sentence) centrality is computed in two ways: (a)~we employ BERT, a
state-of-the-art neural representation learning model to better capture
sentential meaning and (b)~we build graphs with directed edges arguing that the
contribution of any two nodes to their respective centrality is influenced by
their relative position in a document. Experimental results on three news
summarization datasets representative of different languages and writing styles
show that our approach outperforms strong baselines by a wide margin.Comment: ACL 201
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
- …