830 research outputs found

    Fast Rhetorical Structure Theory Discourse Parsing

    Full text link
    In recent years, There has been a variety of research on discourse parsing, particularly RST discourse parsing. Most of the recent work on RST parsing has focused on implementing new types of features or learning algorithms in order to improve accuracy, with relatively little focus on efficiency, robustness, or practical use. Also, most implementations are not widely available. Here, we describe an RST segmentation and parsing system that adapts models and feature sets from various previous work, as described below. Its accuracy is near state-of-the-art, and it was developed to be fast, robust, and practical. For example, it can process short documents such as news articles or essays in less than a second

    Syntax and Semantics of Italian Poetry in the First Half of the 20th Century

    Get PDF
    In this paper we study, analyse and comment rhetorical figures present in some of most interesting poetry of the first half of the twentieth century. These figures are at first traced back to some famous poet of the past and then compared to classical Latin prose. Linguistic theory is then called in to show how they can be represented in syntactic structures and classified as noncanonical structures, by positioning discontinuous or displaced linguistic elements in Spec XP projections at various levels of constituency. Then we introduce LFG (Lexical Functional Grammar) as the theory that allows us to connect syntactic noncanonical structures with informational structure and psycholinguistic theories for complexity evaluation. We end up with two computational linguistics experiments and then evaluate the results. The first one uses best online parsers of Italian to parse poetic structures; the second one uses Getarun, the system created at Ca Foscari Computational Linguistics Laboratory. As will be shown, the first approach is unable to cope with these structures due to the use of only statistical probabilistic information. On the contrary, the second one, being a symbolic rule based system, is by far superior and allows also to complete both semantic an pragmatic analysis.Comment: To appear in Proceedings of AIUCD 2016 (revised version as of March 19, 2018

    HILDA: A Discourse Parser Using Support Vector Machine Classification

    Get PDF
    Discourse structures have a central role in several computational tasks, such as question-answering or dialogue generation. In particular, the framework of the Rhetorical Structure Theory (RST) offers a sound formalism for hierarchical text organization. In this article, we present HILDA, an implemented discourse parser based on RST and Support Vector Machine (SVM) classification. SVM classifiers are trained and applied to discourse segmentation and relation labeling. By combining labeling with a greedy bottom-up tree building approach, we are able to create accurate discourse trees in linear time complexity. Importantly, our parser can parse entire texts, whereas the publicly available parser SPADE (Soricut and Marcu 2003) is limited to sentence level analysis. HILDA outperforms other discourse parsers for tree structure construction and discourse relation labeling. For the discourse parsing task, our system reaches 78.3% of the performance level of human annotators. Compared to a state-of-the-art rule-based discourse parser, our system achieves a performance increase of 11.6%

    What's Hard in English RST Parsing? Predictive Models for Error Analysis

    Full text link
    Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in identifying long-distance relations, out-of-vocabulary items, and more. In order to assess the relative importance of these variables, we also release two annotated English test-sets with explicit correct and distracting discourse markers associated with gold standard RST relations. Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge, while lack of lexical overlap is less of a problem, at least for in-domain parsing. Our final model is able to predict where errors will occur with an accuracy of 76.3% for the bottom-up parser and 76.6% for the top-down parser.Comment: SIGDIAL 2023 camera-ready; 12 page

    Best-First Surface Realization

    Get PDF
    Current work in surface realization concentrates on the use of general, abstract algorithms that interpret large, reversible grammars. Only little attention has been paid so far to the many small and simple applications that require coverage of a small sublanguage at different degrees of sophistication. The system TG/2 described in this paper can be smoothly integrated with deep generation processes, it integrates canned text, templates, and context-free rules into a single formalism, it allows for both textual and tabular output, and it can be parameterized according to linguistic preferences. These features are based on suitably restricted production system techniques and on a generic backtracking regime.Comment: 10 pages, LaTeX source, one EPS figur
    • …
    corecore