Search CORE

177,338 research outputs found

A Hierarchical Neural Autoencoder for Paragraphs and Documents

Author: Jurafsky Dan
Li Jiwei
Luong Minh-Thang
Publication venue
Publication date: 01/01/2015
Field of study

Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization\footnote{Code for the three models described in this paper can be found at www.stanford.edu/~jiweil/

arXiv.org e-Print Archive

CiteSeerX

Language choice models for microplanning and readability

Author: Williams Sandra
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2003
Field of study

This paper describes the construction of language choice models for the microplanning of discourse relations in a Natural Language Generation system that attempts to generate appropriate texts for users with varying levels of literacy. The models consist of constraint satisfaction problem graphs that have been derived from the results of a corpus analysis. The corpus that the models are based on was written for good readers. We adapted the models for poor readers by allowing certain constraints to be tightened, based on psycholinguistic evidence. We describe how the design of microplanner is evolving. We discuss the compromises involved in generating more readable textual output and implications of our design for NLG architectures. Finally we describe plans for future work

CiteSeerX

Crossref

Open Research Online (The Open University)

The E2E Dataset: New Challenges For End-to-End Generation

Author: Dušek Ondřej
Novikova Jekaterina
Rieser Verena
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.Comment: Accepted as a short paper for SIGDIAL 2017 (final submission including supplementary material

arXiv.org e-Print Archive

Crossref

Heriot Watt Pure

Supporting process model validation through natural language generation

Author: Leopold Henrik
Mendling Jan
Polyvyanyy Artem
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/01/2014
Field of study

The design and development of process-aware information systems is often supported by specifying requirements as business process models. Although this approach is generally accepted as an effective strategy, it remains a fundamental challenge to adequately validate these models given the diverging skill set of domain experts and system analysts. As domain experts often do not feel confident in judging the correctness and completeness of process models that system analysts create, the validation often has to regress to a discourse using natural language. In order to support such a discourse appropriately, so-called verbalization techniques have been defined for different types of conceptual models. However, there is currently no sophisticated technique available that is capable of generating natural-looking text from process models. In this paper, we address this research gap and propose a technique for generating natural language texts from business process models. A comparison with manually created process descriptions demonstrates that the generated texts are superior in terms of completeness, structure, and linguistic complexity. An evaluation with users further demonstrates that the texts are very understandable and effectively allow the reader to infer the process model semantics. Hence, the generated texts represent a useful input for process model validation

ZENODO

Queensland University of Technology ePrints Archive

Elektronische Publikationen der Wirtschaftsuniversität Wien

University of Melbourne Institutional Repository

Recommended from our members

Generating Feedback Reports for Adults Taking Basic Skills Tests

Author: Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/12/2005
Field of study

SkillSum is an Artificial Intelligence (AI) and Natural Language Generation (NLG) system that produces short feedback reports for people who are taking online tests which check their basic literacy and numeracy skills. In this paper, we describe the SkillSum system and application, focusing on three challenges which we believe are important ones for many systems which try to generate feedback reports from Web-based tests: choosing content based on very limited data, generating appropriate texts for people with varied levels of literacy and knowledge, and integrating the web-based system with existing assessment and support procedures

Open Research Online (The Open University)

Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis

Author: JR Martin
L Danlos
L Page
M Taboada
S Joty
Ł Augustyniak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/09/2017
Field of study

This paper fills a gap in aspect-based sentiment analysis and aims to present a new method for preparing and analysing texts concerning opinion and generating user-friendly descriptive reports in natural language. We present a comprehensive set of techniques derived from Rhetorical Structure Theory and sentiment analysis to extract aspects from textual opinions and then build an abstractive summary of a set of opinions. Moreover, we propose aspect-aspect graphs to evaluate the importance of aspects and to filter out unimportant ones from the summary. Additionally, the paper presents a prototype solution of data flow with interesting and valuable results. The proposed method's results proved the high accuracy of aspect detection when applied to the gold standard dataset

arXiv.org e-Print Archive

Crossref