830 research outputs found
Fast Rhetorical Structure Theory Discourse Parsing
In recent years, There has been a variety of research on discourse parsing,
particularly RST discourse parsing. Most of the recent work on RST parsing has
focused on implementing new types of features or learning algorithms in order
to improve accuracy, with relatively little focus on efficiency, robustness, or
practical use. Also, most implementations are not widely available. Here, we
describe an RST segmentation and parsing system that adapts models and feature
sets from various previous work, as described below. Its accuracy is near
state-of-the-art, and it was developed to be fast, robust, and practical. For
example, it can process short documents such as news articles or essays in less
than a second
Syntax and Semantics of Italian Poetry in the First Half of the 20th Century
In this paper we study, analyse and comment rhetorical figures present in
some of most interesting poetry of the first half of the twentieth century.
These figures are at first traced back to some famous poet of the past and then
compared to classical Latin prose. Linguistic theory is then called in to show
how they can be represented in syntactic structures and classified as
noncanonical structures, by positioning discontinuous or displaced linguistic
elements in Spec XP projections at various levels of constituency. Then we
introduce LFG (Lexical Functional Grammar) as the theory that allows us to
connect syntactic noncanonical structures with informational structure and
psycholinguistic theories for complexity evaluation. We end up with two
computational linguistics experiments and then evaluate the results. The first
one uses best online parsers of Italian to parse poetic structures; the second
one uses Getarun, the system created at Ca Foscari Computational Linguistics
Laboratory. As will be shown, the first approach is unable to cope with these
structures due to the use of only statistical probabilistic information. On the
contrary, the second one, being a symbolic rule based system, is by far
superior and allows also to complete both semantic an pragmatic analysis.Comment: To appear in Proceedings of AIUCD 2016 (revised version as of March
19, 2018
HILDA: A Discourse Parser Using Support Vector Machine Classification
Discourse structures have a central role in several computational tasks, such as question-answering or dialogue generation. In particular, the framework of the Rhetorical Structure Theory (RST) offers a sound formalism for hierarchical text organization. In this article, we present HILDA, an implemented discourse parser based on RST and Support Vector Machine (SVM) classification. SVM classifiers are trained and applied to discourse segmentation and relation labeling. By combining labeling with a greedy bottom-up tree building approach, we are able to create accurate discourse trees in linear time complexity. Importantly, our parser can parse entire texts, whereas the publicly available parser SPADE (Soricut and Marcu 2003) is limited to sentence level analysis. HILDA outperforms other discourse parsers for tree structure construction and discourse relation labeling. For the discourse parsing task, our system reaches 78.3% of the performance level of human annotators. Compared to a state-of-the-art rule-based discourse parser, our system achieves a performance increase of 11.6%
What's Hard in English RST Parsing? Predictive Models for Error Analysis
Despite recent advances in Natural Language Processing (NLP), hierarchical
discourse parsing in the framework of Rhetorical Structure Theory remains
challenging, and our understanding of the reasons for this are as yet limited.
In this paper, we examine and model some of the factors associated with parsing
difficulties in previous work: the existence of implicit discourse relations,
challenges in identifying long-distance relations, out-of-vocabulary items, and
more. In order to assess the relative importance of these variables, we also
release two annotated English test-sets with explicit correct and distracting
discourse markers associated with gold standard RST relations. Our results show
that as in shallow discourse parsing, the explicit/implicit distinction plays a
role, but that long-distance dependencies are the main challenge, while lack of
lexical overlap is less of a problem, at least for in-domain parsing. Our final
model is able to predict where errors will occur with an accuracy of 76.3% for
the bottom-up parser and 76.6% for the top-down parser.Comment: SIGDIAL 2023 camera-ready; 12 page
Best-First Surface Realization
Current work in surface realization concentrates on the use of general,
abstract algorithms that interpret large, reversible grammars. Only little
attention has been paid so far to the many small and simple applications that
require coverage of a small sublanguage at different degrees of sophistication.
The system TG/2 described in this paper can be smoothly integrated with deep
generation processes, it integrates canned text, templates, and context-free
rules into a single formalism, it allows for both textual and tabular output,
and it can be parameterized according to linguistic preferences. These features
are based on suitably restricted production system techniques and on a generic
backtracking regime.Comment: 10 pages, LaTeX source, one EPS figur
- …