575 research outputs found

    Learning Recursive Segments for Discourse Parsing

    Full text link
    Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse like SDRT allows for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.Comment: published at LREC 201

    Strategies in German-to-Greek Simultaneous Interpreting: A Corpus-Based Approach

    Get PDF
    This paper reports on a corpus study that focuses on the strategies em­ployed during German-to-Greek (DE-EL) simultaneous interpreting (SI). A 15-minute interpreting corpus is analysed in order to investigate the use of interpreting strategies and to record their frequency. Subse­quently, an attempt is made to determine whether the syntactic differ­ences characterising DE and EL have an influence on the interpreting strategies employed during SI. The conclusion drawn is that strategies are indeed used in DE-EL SI; it seems that, as suggested by Riccardi (1999: 171-173), the strategies identified can be assigned to two cate­gories: "general" strategies, which do not seem to be influenced by the language combination of the interpretation, and "specific" strategies, which seem to be linked to the particularities of the language pair in­volved

    Cross-lingual and cross-domain discourse segmentation of entire documents

    Get PDF
    Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5% F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.Comment: To appear in Proceedings of ACL 201

    Cross-lingual RST Discourse Parsing

    Get PDF
    Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page

    Emotirob : from understanding to cognitive interaction

    Get PDF
    International audienceThe ANR project EmotiRob aims at conceiving and carrying out a soft toy robot which can interact emotionally and cognitively with handicapped and fragile children. However the project MAPH (Active Media For the Handicap) which is an extension of EmotiRob extends the cognitive abilities of the robot so as to implement linguistic interaction with the child. This article presents our work for both projects: speech understanding, emotional interaction, cognitive interaction
    corecore