67 research outputs found
Some Reflections on the Task of Content Determination in the Context of Multi-Document Summarization of Evolving Events
Despite its importance, the task of summarizing evolving events has received
small attention by researchers in the field of multi-document summariztion. In
a previous paper (Afantenos et al. 2007) we have presented a methodology for
the automatic summarization of documents, emitted by multiple sources, which
describe the evolution of an event. At the heart of this methodology lies the
identification of similarities and differences between the various documents,
in two axes: the synchronic and the diachronic. This is achieved by the
introduction of the notion of Synchronic and Diachronic Relations. Those
relations connect the messages that are found in the documents, resulting thus
in a graph which we call grid. Although the creation of the grid completes the
Document Planning phase of a typical NLG architecture, it can be the case that
the number of messages contained in a grid is very large, exceeding thus the
required compression rate. In this paper we provide some initial thoughts on a
probabilistic model which can be applied at the Content Determination stage,
and which tries to alleviate this problem.Comment: 5 pages, 2 figure
Testing SDRT's Right Frontier
The Right Frontier Constraint (RFC), as a constraint on the attachment of new
constituents to an existing discourse structure, has important implications for
the interpretation of anaphoric elements in discourse and for Machine Learning
(ML) approaches to learning discourse structures. In this paper we provide
strong empirical support for SDRT's version of RFC. The analysis of about 100
doubly annotated documents by five different naive annotators shows that SDRT's
RFC is respected about 95% of the time. The qualitative analysis of presumed
violations that we have performed shows that they are either click-errors or
structural misconceptions
What's in a Message?
8 pagesInternational audienceIn this paper we present the first step in a larger series of experiments for the induction of predicate/ argument structures. The structures that we are inducing are very similar to the conceptual structures that are used in Frame Semantics (such as FrameNet). Those structures are called messages and they were previously used in the context of a multi-document summarization system of evolving events. The series of experiments that we are proposing are essentially composed from two stages. In the first stage we are trying to extract a representative vocabulary of words. This vocabulary is later used in the second stage, during which we apply to it various clustering approaches in order to identify the clusters of predicates and arguments—or frames and semantic roles, to use the jargon of Frame Semantics. This paper presents in detail and evaluates the first stage
Counter-Argumentation and Discourse: A Case Study
International audienceDespite the central role that argumentation plays in human communication, the computational linguistics community has paid relatively little attention in proposing a methodology for automatically identifying arguments and their relations in texts. Argumentation is intimately related with discourse structure, since an argument often spans more than one phrase, forming thus an entity with its own coherent internal structure. Moreover, arguments are linked between them either with a support, an attack or a rebuttal relation. Those argumentation relations are often realized via a discourse relation. Unfortunately, most of the discourse representation theories use trees in order to represent discourse, a format which is incapable of representing phenomena such as long distance attachments and crossed dependencies which are crucial for argumentation. A notable exception is Segmented Discourse Representation Theory (SDRT) (Asher and Lascarides, 2003). In this paper we show how SDRT can help identify arguments and their relations. We use counter-argumentation as our case study following Apotheloz (1989)and Amgoud and Prade (2012) showing how the identification of the discourse structure can greatly benefit the identification of the argumentation structure
What's in a Message?
In this paper we present the first step in a larger series of experiments for
the induction of predicate/argument structures. The structures that we are
inducing are very similar to the conceptual structures that are used in Frame
Semantics (such as FrameNet). Those structures are called messages and they
were previously used in the context of a multi-document summarization system of
evolving events. The series of experiments that we are proposing are
essentially composed from two stages. In the first stage we are trying to
extract a representative vocabulary of words. This vocabulary is later used in
the second stage, during which we apply to it various clustering approaches in
order to identify the clusters of predicates and arguments--or frames and
semantic roles, to use the jargon of Frame Semantics. This paper presents in
detail and evaluates the first stage
Let's get the student into the driver's seat
Speaking a language and achieving proficiency in another one is a highly
complex process which requires the acquisition of various kinds of knowledge
and skills, like the learning of words, rules and patterns and their connection
to communicative goals (intentions), the usual starting point. To help the
learner to acquire these skills we propose an enhanced, electronic version of
an age old method: pattern drills (henceforth PDs). While being highly regarded
in the fifties, PDs have become unpopular since then, partially because of
their lack of grounding (natural context) and rigidity. Despite these
shortcomings we do believe in the virtues of this approach, at least with
regard to the acquisition of basic linguistic reflexes or skills (automatisms),
necessary to survive in the new language. Of course, the method needs
improvement, and we will show here how this can be achieved. Unlike tapes or
books, computers are open media, allowing for dynamic changes, taking users'
performances and preferences into account. Building an electronic version of
PDs amounts to building an open resource, accomodatable to the users' ever
changing needs.Comment: 6 page
Learning Recursive Segments for Discourse Parsing
Automatically detecting discourse segments is an important preliminary step
towards full discourse parsing. Previous research on discourse segmentation
have relied on the assumption that elementary discourse units (EDUs) in a
document always form a linear sequence (i.e., they can never be nested).
Unfortunately, this assumption turns out to be too strong, for some theories of
discourse like SDRT allows for nested discourse units. In this paper, we
present a simple approach to discourse segmentation that is able to produce
nested EDUs. Our approach builds on standard multi-class classification
techniques combined with a simple repairing heuristic that enforces global
coherence. Our system was developed and evaluated on the first round of
annotations provided by the French Annodis project (an ongoing effort to create
a discourse bank for French). Cross-validated on only 47 documents (1,445
EDUs), our system achieves encouraging performance results with an F-score of
73% for finding EDUs.Comment: published at LREC 201
Counter-Argumentation and Discourse: A Case Study
Despite the central role that argumentation plays in human communication, the computational linguistics community has paid relatively little attention in proposing a methodology for automatically identifying arguments and their relations in texts. Argumentation is intimately related with discourse structure, since an argument often spans more than one phrase, forming thus an entity with its own coherent internal structure. Moreover, arguments are linked between them either with a support, an attack or a rebuttal relation. Those argumentation relations are often realized via a discourse relation. Unfortunately, most of the discourse representation theories use trees in order to represent discourse, a format which is incapable of representing phenomena such as long distance attachments and crossed dependencies which are crucial for argumentation. A notable exception is Segmented Discourse Representation Theory (SDRT) (Asher and Lascarides, 2003). In this paper we show how SDRT can help identify arguments and their relations. We use counter-argumentation as our case study following Apotheloz (1989)and Amgoud and Prade (2012) showing how the identification of the discourse structure can greatly benefit the identification of the argumentation structure
- …