2,015 research outputs found
Neural Discourse Structure for Text Categorization
We show that discourse structure, as defined by Rhetorical Structure Theory
and provided by an existing discourse parser, benefits text categorization. Our
approach uses a recursive neural network and a newly proposed attention
mechanism to compute a representation of the text that focuses on salient
content, from the perspective of both RST and the task. Experiments consider
variants of the approach and illustrate its strengths and weaknesses.Comment: ACL 2017 camera ready versio
Applying the bundle-move connection approach to the development of an online writing support tool for research articles
With advances in information and computer technology, genre-based writing pedagogy has developed greatly in recent years. In order to further this growth in technology-enhanced genre writing pedagogy, the current study developed a data-driven and theory-based practical writing support tool for research articles (RAs). This web-based, innovative tool, powered by the combination of rhetorical moves and lexical bundles, has an auto-complete feature that suggests the most frequent lexical bundles in a move within an RA section. It was developed based on the proof-of-concept of the bundle-move connection approach. Preliminary user feedback was positive was overall, and it was found that the writing support tool brought about beneficial effects that genre writing pedagogy explicitly aims to achieve. In light of these findings, the pedagogical implications of the developed tool are discussed, with particular focus on the potential role that it could play in the teaching and learning of technology-enhanced genre writing.The study was funded by the 2015-2016 Kansai University Outlay for Establishing Research Centers and JSPS KAKENHI Grant Numbers 17H02369 and 15K02717
A corpus analysis of discourse relations for Natural Language Generation
We are developing a Natural Language Generation (NLG) system that generates texts tailored for the reading ability of individual readers. As part of building the system, GIRL (Generator for Individual Reading Levels), we carried out an analysis of the RST Discourse Treebank Corpus to find out how human writers linguistically realise discourse relations. The goal of the analysis was (a) to create a model of the choices that need to be made when realising discourse relations, and (b) to understand how these choices were typically made for “normal” readers, for a variety of discourse relations. We present our results for discourse relations: concession, condition, elaboration additional, evaluation, example, reason and restatement. We discuss the results and how they were used in GIRL
A PDTB-Styled End-to-End Discourse Parser
We have developed a full discourse parser in the Penn Discourse Treebank
(PDTB) style. Our trained parser first identifies all discourse and
non-discourse relations, locates and labels their arguments, and then
classifies their relation types. When appropriate, the attribution spans to
these relations are also determined. We present a comprehensive evaluation from
both component-wise and error-cascading perspectives.Comment: 15 pages, 5 figures, 7 table
HILDA: A Discourse Parser Using Support Vector Machine Classification
Discourse structures have a central role in several computational tasks, such as question-answering or dialogue generation. In particular, the framework of the Rhetorical Structure Theory (RST) offers a sound formalism for hierarchical text organization. In this article, we present HILDA, an implemented discourse parser based on RST and Support Vector Machine (SVM) classification. SVM classifiers are trained and applied to discourse segmentation and relation labeling. By combining labeling with a greedy bottom-up tree building approach, we are able to create accurate discourse trees in linear time complexity. Importantly, our parser can parse entire texts, whereas the publicly available parser SPADE (Soricut and Marcu 2003) is limited to sentence level analysis. HILDA outperforms other discourse parsers for tree structure construction and discourse relation labeling. For the discourse parsing task, our system reaches 78.3% of the performance level of human annotators. Compared to a state-of-the-art rule-based discourse parser, our system achieves a performance increase of 11.6%
Annotation of Scientific Summaries for Information Retrieval.
International audienceWe present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus
- …