575 research outputs found
Learning Recursive Segments for Discourse Parsing
Automatically detecting discourse segments is an important preliminary step
towards full discourse parsing. Previous research on discourse segmentation
have relied on the assumption that elementary discourse units (EDUs) in a
document always form a linear sequence (i.e., they can never be nested).
Unfortunately, this assumption turns out to be too strong, for some theories of
discourse like SDRT allows for nested discourse units. In this paper, we
present a simple approach to discourse segmentation that is able to produce
nested EDUs. Our approach builds on standard multi-class classification
techniques combined with a simple repairing heuristic that enforces global
coherence. Our system was developed and evaluated on the first round of
annotations provided by the French Annodis project (an ongoing effort to create
a discourse bank for French). Cross-validated on only 47 documents (1,445
EDUs), our system achieves encouraging performance results with an F-score of
73% for finding EDUs.Comment: published at LREC 201
Strategies in German-to-Greek Simultaneous Interpreting: A Corpus-Based Approach
This paper reports on a corpus study that focuses on the strategies employed during German-to-Greek (DE-EL) simultaneous interpreting (SI). A 15-minute interpreting corpus is analysed in order to investigate the use of interpreting strategies and to record their frequency. Subsequently, an attempt is made to determine whether the syntactic differences characterising DE and EL have an influence on the interpreting strategies employed during SI. The conclusion drawn is that strategies are indeed used in DE-EL SI; it seems that, as suggested by Riccardi (1999: 171-173), the strategies identified can be assigned to two categories: "general" strategies, which do not seem to be influenced by the language combination of the interpretation, and "specific" strategies, which seem to be linked to the particularities of the language pair involved
Cross-lingual and cross-domain discourse segmentation of entire documents
Discourse segmentation is a crucial step in building end-to-end discourse
parsers. However, discourse segmenters only exist for a few languages and
domains. Typically they only detect intra-sentential segment boundaries,
assuming gold standard sentence and token segmentation, and relying on
high-quality syntactic parses and rich heuristics that are not generally
available across languages and domains. In this paper, we propose statistical
discourse segmenters for five languages and three domains that do not rely on
gold pre-annotations. We also consider the problem of learning discourse
segmenters when no labeled data is available for a language. Our fully
supervised system obtains 89.5% F1 for English newswire, with slight drops in
performance on other domains, and we report supervised and unsupervised
(cross-lingual) results for five languages in total.Comment: To appear in Proceedings of ACL 201
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
Emotirob : from understanding to cognitive interaction
International audienceThe ANR project EmotiRob aims at conceiving and carrying out a soft toy robot which can interact emotionally and cognitively with handicapped and fragile children. However the project MAPH (Active Media For the Handicap) which is an extension of EmotiRob extends the cognitive abilities of the robot so as to implement linguistic interaction with the child. This article presents our work for both projects: speech understanding, emotional interaction, cognitive interaction
- …