5,241 research outputs found

    Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

    Full text link
    Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact both on usability and perceived quality. Most NLG systems in common use employ rules and heuristics and tend to generate rigid and stylised responses without the natural variation of human language. They are also not easily scaled to systems covering multiple domains and languages. This paper presents a statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure. The LSTM generator can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates. With fewer heuristics, an objective evaluation in two differing test domains showed the proposed method improved performance compared to previous methods. Human judges scored the LSTM system higher on informativeness and naturalness and overall preferred it to the other systems.Comment: To be appear in EMNLP 201

    Optimising Spoken Dialogue Strategies within the Reinforcement Learning Paradigm

    Get PDF
    Optimising Spoken Dialogue Strategies within the Reinforcement Learning Paradig

    Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking

    Full text link
    The natural language generation (NLG) component of a spoken dialogue system (SDS) usually needs a substantial amount of handcrafting or a well-labeled dataset to be trained on. These limitations add significantly to development costs and make cross-domain, multi-lingual dialogue systems intractable. Moreover, human languages are context-aware. The most natural response should be directly learned from data rather than depending on predefined syntaxes or rules. This paper presents a statistical language generator based on a joint recurrent and convolutional neural network structure which can be trained on dialogue act-utterance pairs without any semantic alignments or predefined grammar trees. Objective metrics suggest that this new model outperforms previous methods under the same experimental conditions. Results of an evaluation by human judges indicate that it produces not only high quality but linguistically varied utterances which are preferred compared to n-gram and rule-based systems.Comment: To be appear in SigDial 201

    Crowd-sourcing NLG Data: Pictures Elicit Better Data

    Full text link
    Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowdsourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicit data. We show that pictorial MRs result in better NL data being collected than logic-based MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.Comment: The 9th International Natural Language Generation conference INLG, 2016. 10 pages, 2 figures, 3 table

    Using multimedia to enhance the accessibility of the learning environment for disabled students: reflections from the Skills for Access project

    Get PDF
    As educators' awareness of their responsibilities towards ensuring the accessibility of the learning environment to disabled students increases, significant debate surrounds the implications of accessibility requirements on educational multimedia. There would appear to be widespread concern that the fundamental principles of creating accessible web‐based materials seem at odds with the creative and innovative use of multimedia to support learning and teaching, as well as concerns over the time and cost of providing accessibility features that can hold back resource development and application. Yet, effective use of multimedia offers a way of enhancing the accessibility of the learning environment for many groups of disabled students. Using the development of ‘Skills for Access’, a web resource supporting the dual aims of creating optimally accessible multimedia for learning, as an example, the attitudinal, practical and technical challenges facing the effective use of multimedia as an accessibility aid in a learning environment will be explored. Reasons why a holistic approach to accessibility may be the most effective in ensuring that multimedia reaches its full potential in enabling and supporting students in learning, regardless of any disability they may have, will be outlined and discussed

    Scaling up deep reinforcement learning for multi-domain dialogue systems

    Get PDF
    Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems due to large search spaces. This paper proposes a three-stage method for multi-domain dialogue policy learning—termed NDQN, and applies it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. In this method, the first stage does multi-policy learning via a network of DQN agents; the second makes use of compact state representations by compressing raw inputs; and the third stage applies a pre-training phase for bootstraping the behaviour of agents in the network. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that the proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems. An additional evaluation reports that the NDQN agents outperformed a K-Nearest Neighbour baseline in task success and dialogue length, yielding more efficient and successful dialogues

    Introduction for speech and language for interactive robots

    Get PDF
    This special issue includes research articles which apply spoken language processing to robots that interact with human users through speech, possibly combined with other modalities. Robots that can listen to human speech, understand it, interact according to the conveyed meaning, and respond represent major research and technological challenges. Their common aim is to equip robots with natural interaction abilities. However, robotics and spoken language processing are areas that are typically studied within their respective communities with limited communication across disciplinary boundaries. The articles in this special issue represent examples that address the need for an increased multidisciplinary exchange of ideas

    Conversational natural language interaction for place-related knowledge acquisition

    Get PDF
    We focus on the problems of using Natural Language inter- action to support pedestrians in their place-related knowledge acquisi- tion. Our case study for this discussion is a smartphone-based Natu- ral Language interface that allows users to acquire spatial and cultural knowledge of a city. The framework consists of a spoken dialogue-based information system and a smartphone client. The system is novel in com- bining geographic information system (GIS) modules such as a visibility engine with a question-answering (QA) system. Users can use the smart- phone client to engage in a variety of interleaved conversations such as navigating from A to B, using the QA functionality to learn more about points of interest (PoI) nearby, and searching for amenities and tourist attractions. This system explores a variety of research questions involving Natural Language interaction for acquisition of knowledge about space and place

    Generating multimedia presentations: from plain text to screenplay

    Get PDF
    In many Natural Language Generation (NLG) applications, the output is limited to plain text – i.e., a string of words with punctuation and paragraph breaks, but no indications for layout, or pictures, or dialogue. In several projects, we have begun to explore NLG applications in which these extra media are brought into play. This paper gives an informal account of what we have learned. For coherence, we focus on the domain of patient information leaflets, and follow an example in which the same content is expressed first in plain text, then in formatted text, then in text with pictures, and finally in a dialogue script that can be performed by two animated agents. We show how the same meaning can be mapped to realisation patterns in different media, and how the expanded options for expressing meaning are related to the perceived style and tone of the presentation. Throughout, we stress that the extra media are not simple added to plain text, but integrated with it: thus the use of formatting, or pictures, or dialogue, may require radical rewording of the text itself
    corecore