2,523 research outputs found
Learning Recursive Segments for Discourse Parsing
Automatically detecting discourse segments is an important preliminary step
towards full discourse parsing. Previous research on discourse segmentation
have relied on the assumption that elementary discourse units (EDUs) in a
document always form a linear sequence (i.e., they can never be nested).
Unfortunately, this assumption turns out to be too strong, for some theories of
discourse like SDRT allows for nested discourse units. In this paper, we
present a simple approach to discourse segmentation that is able to produce
nested EDUs. Our approach builds on standard multi-class classification
techniques combined with a simple repairing heuristic that enforces global
coherence. Our system was developed and evaluated on the first round of
annotations provided by the French Annodis project (an ongoing effort to create
a discourse bank for French). Cross-validated on only 47 documents (1,445
EDUs), our system achieves encouraging performance results with an F-score of
73% for finding EDUs.Comment: published at LREC 201
Recommended from our members
Proceedings of QG2010: The Third Workshop on Question Generation
These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge".
QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
Event-based Access to Historical Italian War Memoirs
The progressive digitization of historical archives provides new, often
domain specific, textual resources that report on facts and events which have
happened in the past; among these, memoirs are a very common type of primary
source. In this paper, we present an approach for extracting information from
Italian historical war memoirs and turning it into structured knowledge. This
is based on the semantic notions of events, participants and roles. We evaluate
quantitatively each of the key-steps of our approach and provide a graph-based
representation of the extracted knowledge, which allows to move between a Close
and a Distant Reading of the collection.Comment: 23 pages, 6 figure
Forest Driven Dependency Analysis Enhanced by Japanese Clause Structure Estimation
PACLIC 20 / Wuhan, China / 1-3 November, 200
Parsing Argumentation Structures in Persuasive Essays
In this article, we present a novel approach for parsing argumentation
structures. We identify argument components using sequence labeling at the
token level and apply a new joint model for detecting argumentation structures.
The proposed model globally optimizes argument component types and
argumentative relations using integer linear programming. We show that our
model considerably improves the performance of base classifiers and
significantly outperforms challenging heuristic baselines. Moreover, we
introduce a novel corpus of persuasive essays annotated with argumentation
structures. We show that our annotation scheme and annotation guidelines
successfully guide human annotators to substantial agreement. This corpus and
the annotation guidelines are freely available for ensuring reproducibility and
to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26
October 2015. Revised submission: 15 July 201
- …