2,180 research outputs found
Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena
Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis;
and it contains many sorts of disfluency such as mid-utterance/sentence
hesitations, interruptions, and self-corrections. But training data for machine
learning approaches to dialogue processing is often either cleaned-up or wholly
synthetic in order to avoid such phenomena. The question then arises of how
well systems trained on such clean data generalise to real spontaneous
dialogue, or indeed whether they are trainable at all on naturally occurring
dialogue data. To answer this question, we created a new corpus called bAbI+ by
systematically adding natural spontaneous incremental dialogue phenomena such
as restarts and self-corrections to the Facebook AI Research's bAbI dialogues
dataset. We then explore the performance of a state-of-the-art retrieval model,
MemN2N, on this more natural dataset. Results show that the semantic accuracy
of the MemN2N model drops drastically; and that although it is in principle
able to learn to process the constructions in bAbI+, it needs an impractical
amount of training data to do so. Finally, we go on to show that an
incremental, semantic parser -- DyLan -- shows 100% semantic accuracy on both
bAbI and bAbI+, highlighting the generalisation properties of linguistically
informed dialogue models.Comment: 9 pages, 3 figures, 2 tables. Accepted as a full paper for SemDial
201
Visual analytics of academic writing
This paper describes a novel analytics dashboard which visualises the key features of scholarly documents. The Dashboard aggregates the salient sentences of scholarly papers, their rhetorical types and the key concepts mentioned within these sentences. These features are extracted from papers through a Natural Language Processing (NLP) technology, called Xerox Incremental Parser (XIP). The XIP Dashboard is a set of visual analytics modules based on the XIP output. In this paper, we briefly introduce the XIP technology and demonstrate an example visualisation of the XIP Dashboard
Parsing of Spoken Language under Time Constraints
Spoken language applications in natural dialogue settings place serious
requirements on the choice of processing architecture. Especially under adverse
phonetic and acoustic conditions parsing procedures have to be developed which
do not only analyse the incoming speech in a time-synchroneous and incremental
manner, but which are able to schedule their resources according to the varying
conditions of the recognition process. Depending on the actual degree of local
ambiguity the parser has to select among the available constraints in order to
narrow down the search space with as little effort as possible.
A parsing approach based on constraint satisfaction techniques is discussed.
It provides important characteristics of the desired real-time behaviour and
attempts to mimic some of the attention focussing capabilities of the human
speech comprehension mechanism.Comment: 19 pages, LaTe
A Transition-Based Directed Acyclic Graph Parser for UCCA
We present the first parser for UCCA, a cross-linguistically applicable
framework for semantic representation, which builds on extensive typological
work and supports rapid annotation. UCCA poses a challenge for existing parsing
techniques, as it exhibits reentrancy (resulting in DAG structures),
discontinuous structures and non-terminal nodes corresponding to complex
semantic units. To our knowledge, the conjunction of these formal properties is
not supported by any existing parser. Our transition-based parser, which uses a
novel transition set and features based on bidirectional LSTMs, has value not
just for UCCA parsing: its ability to handle more general graph structures can
inform the development of parsers for other semantic DAG structures, and in
languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201
- …