177,338 research outputs found
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Natural language generation of coherent long texts like paragraphs or longer
documents is a challenging problem for recurrent networks models. In this
paper, we explore an important step toward this generation task: training an
LSTM (Long-short term memory) auto-encoder to preserve and reconstruct
multi-sentence paragraphs. We introduce an LSTM model that hierarchically
builds an embedding for a paragraph from embeddings for sentences and words,
then decodes this embedding to reconstruct the original paragraph. We evaluate
the reconstructed paragraph using standard metrics like ROUGE and Entity Grid,
showing that neural models are able to encode texts in a way that preserve
syntactic, semantic, and discourse coherence. While only a first step toward
generating coherent text units from neural models, our work has the potential
to significantly impact natural language generation and
summarization\footnote{Code for the three models described in this paper can be
found at www.stanford.edu/~jiweil/
Language choice models for microplanning and readability
This paper describes the construction of language choice models for the microplanning of discourse relations in a Natural Language Generation system that attempts to generate appropriate texts for users with varying levels of literacy. The models consist of constraint satisfaction problem graphs that have been derived from the results of a corpus analysis. The corpus that the models are based on was written for good readers. We adapted the models for poor readers by allowing certain constraints to be tightened, based on psycholinguistic evidence. We describe how the design of microplanner is evolving. We discuss the compromises involved in generating more readable textual output and implications of our design for NLG architectures. Finally we describe plans for future work
The E2E Dataset: New Challenges For End-to-End Generation
This paper describes the E2E data, a new dataset for training end-to-end,
data-driven natural language generation systems in the restaurant domain, which
is ten times bigger than existing, frequently used datasets in this area. The
E2E dataset poses new challenges: (1) its human reference texts show more
lexical richness and syntactic variation, including discourse phenomena; (2)
generating from this set requires content selection. As such, learning from
this dataset promises more natural, varied and less template-like system
utterances. We also establish a baseline on this dataset, which illustrates
some of the difficulties associated with this data.Comment: Accepted as a short paper for SIGDIAL 2017 (final submission
including supplementary material
Supporting process model validation through natural language generation
The design and development of process-aware information systems is often supported by specifying requirements as business process models. Although this approach is generally accepted as an effective strategy, it remains a fundamental challenge to adequately validate these models given the diverging skill set of domain experts and system analysts. As domain experts often do not feel confident in judging the correctness and completeness of process models that system analysts create, the validation often has to regress to a discourse using natural language. In order to support such a discourse appropriately, so-called verbalization techniques have been defined for different types of conceptual models. However, there is currently no sophisticated technique available that is capable of generating natural-looking text from process models. In this paper, we address this research gap and propose a technique for generating natural language texts from business process models. A comparison with manually created process descriptions demonstrates that the generated texts are superior in terms of completeness, structure, and linguistic complexity. An evaluation with users further demonstrates that the texts are very understandable and effectively allow the reader to infer the process model semantics. Hence, the generated texts represent a useful input for process model validation
Recommended from our members
Generating Feedback Reports for Adults Taking Basic Skills Tests
SkillSum is an Artificial Intelligence (AI) and Natural Language Generation (NLG) system that produces short feedback reports for people who are taking online tests which check their basic literacy and numeracy skills. In this paper, we describe the SkillSum system and application, focusing on three challenges which we believe are important ones for many systems which try to generate feedback reports from Web-based tests: choosing content based on very limited data, generating appropriate texts for people with varied levels of literacy and knowledge, and integrating the web-based system with existing assessment and support procedures
Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis
This paper fills a gap in aspect-based sentiment analysis and aims to present
a new method for preparing and analysing texts concerning opinion and
generating user-friendly descriptive reports in natural language. We present a
comprehensive set of techniques derived from Rhetorical Structure Theory and
sentiment analysis to extract aspects from textual opinions and then build an
abstractive summary of a set of opinions. Moreover, we propose aspect-aspect
graphs to evaluate the importance of aspects and to filter out unimportant ones
from the summary. Additionally, the paper presents a prototype solution of data
flow with interesting and valuable results. The proposed method's results
proved the high accuracy of aspect detection when applied to the gold standard
dataset
- …