15,113 research outputs found
Message-Passing Protocols for Real-World Parsing -- An Object-Oriented Model and its Preliminary Evaluation
We argue for a performance-based design of natural language grammars and
their associated parsers in order to meet the constraints imposed by real-world
NLP. Our approach incorporates declarative and procedural knowledge about
language and language use within an object-oriented specification framework. We
discuss several message-passing protocols for parsing and provide reasons for
sacrificing completeness of the parse in favor of efficiency based on a
preliminary empirical evaluation.Comment: 12 pages, uses epsfig.st
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Natural language generation of coherent long texts like paragraphs or longer
documents is a challenging problem for recurrent networks models. In this
paper, we explore an important step toward this generation task: training an
LSTM (Long-short term memory) auto-encoder to preserve and reconstruct
multi-sentence paragraphs. We introduce an LSTM model that hierarchically
builds an embedding for a paragraph from embeddings for sentences and words,
then decodes this embedding to reconstruct the original paragraph. We evaluate
the reconstructed paragraph using standard metrics like ROUGE and Entity Grid,
showing that neural models are able to encode texts in a way that preserve
syntactic, semantic, and discourse coherence. While only a first step toward
generating coherent text units from neural models, our work has the potential
to significantly impact natural language generation and
summarization\footnote{Code for the three models described in this paper can be
found at www.stanford.edu/~jiweil/
Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank
Discourse parsing has long been treated as a stand-alone problem independent
from constituency or dependency parsing. Most attempts at this problem are
pipelined rather than end-to-end, sophisticated, and not self-contained: they
assume gold-standard text segmentations (Elementary Discourse Units), and use
external parsers for syntactic features. In this paper we propose the first
end-to-end discourse parser that jointly parses in both syntax and discourse
levels, as well as the first syntacto-discourse treebank by integrating the
Penn Treebank with the RST Treebank. Built upon our recent span-based
constituency parser, this joint syntacto-discourse parser requires no
preprocessing whatsoever (such as segmentation or feature extraction), achieves
the state-of-the-art end-to-end discourse parsing accuracy.Comment: Accepted at EMNLP 201
- …