19 research outputs found
Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis
This paper fills a gap in aspect-based sentiment analysis and aims to present
a new method for preparing and analysing texts concerning opinion and
generating user-friendly descriptive reports in natural language. We present a
comprehensive set of techniques derived from Rhetorical Structure Theory and
sentiment analysis to extract aspects from textual opinions and then build an
abstractive summary of a set of opinions. Moreover, we propose aspect-aspect
graphs to evaluate the importance of aspects and to filter out unimportant ones
from the summary. Additionally, the paper presents a prototype solution of data
flow with interesting and valuable results. The proposed method's results
proved the high accuracy of aspect detection when applied to the gold standard
dataset
Discourse Indicators for Content Selection in Summaization
We present analyses aimed at eliciting which specific aspects of discourse provide the strongest indication for text importance. In the context of content selection for single document summarization of news, we examine the benefits of both the graph structure of text provided by discourse relations and the semantic sense of these relations. We find that structure information is the most robust indicator of importance. Semantic sense only provides constraints on content selection but is not indicative of important content by itself. However, sense features complement structure information and lead to improved performance. Further, both types of discourse information prove complementary to non-discourse features. While our results establish the usefulness of discourse features, we also find that lexical overlap provides a simple and cheap alternative to discourse for computing text structure with comparable performance for the task of content selection
Jointly Modeling Topics and Intents with Global Order Structure
Modeling document structure is of great importance for discourse analysis and
related applications. The goal of this research is to capture the document
intent structure by modeling documents as a mixture of topic words and
rhetorical words. While the topics are relatively unchanged through one
document, the rhetorical functions of sentences usually change following
certain orders in discourse. We propose GMM-LDA, a topic modeling based
Bayesian unsupervised model, to analyze the document intent structure
cooperated with order information. Our model is flexible that has the ability
to combine the annotations and do supervised learning. Additionally, entropic
regularization can be introduced to model the significant divergence between
topics and intents. We perform experiments in both unsupervised and supervised
settings, results show the superiority of our model over several
state-of-the-art baselines.Comment: Accepted by AAAI 201
Identifying Relationships Among Sentences in Court Case Transcripts Using Discourse Relations
Case Law has a significant impact on the proceedings of legal cases.
Therefore, the information that can be obtained from previous court cases is
valuable to lawyers and other legal officials when performing their duties.
This paper describes a methodology of applying discourse relations between
sentences when processing text documents related to the legal domain. In this
study, we developed a mechanism to classify the relationships that can be
observed among sentences in transcripts of United States court cases. First, we
defined relationship types that can be observed between sentences in court case
transcripts. Then we classified pairs of sentences according to the
relationship type by combining a machine learning model and a rule-based
approach. The results obtained through our system were evaluated using human
judges. To the best of our knowledge, this is the first study where discourse
relationships between sentences have been used to determine relationships among
sentences in legal court case transcripts.Comment: Conference: 2018 International Conference on Advances in ICT for
Emerging Regions (ICTer
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page