345 research outputs found

    Spoken dialog systems based on online generated stochastic finite-state transducers

    Full text link
    This is the author’s version of a work that was accepted for publication in Speech Communication. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Speech Communication 83 (2016) 81–93. DOI 10.1016/j.specom.2016.07.011.In this paper, we present an approach for the development of spoken dialog systems based on the statistical modelization of the dialog manager. This work focuses on three points: the modelization of the dialog manager using Stochastic Finite-State Transducers, an unsupervised way to generate training corpora, and a mechanism to address the problem of coverage that is based on the online generation of synthetic dialogs. Our proposal has been developed and applied to a sport facilities booking task at the university. We present experimentation evaluating the system behavior on a set of dialogs that was acquired using the Wizard of Oz technique as well as experimentation with real users. The experimentation shows that the method proposed to increase the coverage of the Dialog System was useful to find new valid paths in the model to achieve the user goals, providing good results with real users. © 2016 Elsevier B.V. All rights reserved.This work is partially supported by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MINECO TIN2014-54288-C4-3-R).Hurtado Oliver, LF.; Planells Lerma, J.; Segarra Soriano, E.; Sanchís Arnal, E. (2016). Spoken dialog systems based on online generated stochastic finite-state transducers. Speech Communication. 83:81-93. https://doi.org/10.1016/j.specom.2016.07.011S81938

    Finstreder: simple and fast spoken language understanding with finite state transducers using modern speech-to-text models

    Get PDF
    In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches

    Giving voice to the Internet by means of conversational agents

    Get PDF
    Proceedings of: 15th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2014), Salmanca, Spain,In this paper we present a proposal to develop conversational agents that avoids the effort of manually defining the dialog strategy for the agent and also takes into account the benefits of using current standards. In our proposal the dialog manager is trained by means of a POMDP-based methodology using a labeled dialog corpus automatically acquired using a user modeling technique. The statistical dialog model automatically selects the next system response. Thus, system developers only need to define a set of files, each including a system prompt and the associated grammar to recognize user responses. We have applied this technique to develop a conversational agent in VoiceXML that provides information for planning a trip.This work has been supported in part by the Spanish Government under i-Support (Intelligent Agent Based Driver Decision Support) Project (TRA2011-29454-C03- 03), and Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02- 02, and CAM CONTEXTS (S2009/TIC-1485

    A Neural Network Approach to Intention Modeling forUser-Adapted Conversational Agents

    Get PDF
    Spoken dialogue systems have been proposed to enable a more natural and intuitive interaction with the environment andhuman-computer interfaces. In this contribution, we present a framework based on neural networks that allows modeling of theuser’s intention during the dialogue and uses this prediction todynamically adapt the dialoguemodel of the system taking intoconsideration the user’s needs and preferences. We have evaluated our proposal to develop a user-adapted spoken dialogue systemthat facilitates tourist information and services and provide a detailed discussion of the positive influence of our proposal in thesuccess of the interaction, the information and services provided, and the quality perceived by the users

    A proposal to manage multi-task dialogs in conversational interfaces

    Get PDF
    The emergence of smart devices and recent advances in spoken language technology are currently extending the use of conversational interfaces and spoken interaction to perform many tasks. The dialog management task of a conversational interface consists of selecting the next system response considering the user's actions, the dialog history, and the results of accessing the data repositories. In this paper we describe a dialog management technique adapted to multi-task conversational systems. In our proposal, specialized dialog models are used to deal with each specific subtask of dialog objective for which the dialog system has been designed. The practical application of the proposed technique to develop a dialog system acting as a customer support service shows that the use of these specialized dialog models increases the quality and number of successful interactions with the system in comparison with developing a single dialog model

    Spoken language understanding with kernels for syntactic/semantic structures

    Get PDF
    ABSTRACT Automatic concept segmentation and labeling are the fundamental problems of Spoken Language Understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with Support Vector Machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with offthe-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-theart models when combined with n-gram based models

    A proposal for the development of adaptive spoken interfaces to access the Web

    Get PDF
    Spoken dialog systems have been proposed as a solution to facilitate a more natural human–machine interaction. In this paper, we propose a framework to model the user׳s intention during the dialog and adapt the dialog model dynamically to the user needs and preferences, thus developing more efficient, adapted, and usable spoken dialog systems. Our framework employs statistical models based on neural networks that take into account the history of the dialog up to the current dialog state in order to predict the user׳s intention and the next system response. We describe our proposal and detail its application in the Let׳s Go spoken dialog system.Work partially supported by Projects MINECO TEC2012-37832- C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/ TIC-1485

    Spontaneous speech recognition using visual context-aware language models

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.Includes bibliographical references (p. 83-88).The thesis presents a novel situationally-aware multimodal spoken language system called Fuse that performs speech understanding for visual object selection. An experimental task was created in which people were asked to refer, using speech alone, to objects arranged on a table top. During training, Fuse acquires a grammar and vocabulary from a "show-and-tell" procedure in which visual scenes are paired with verbal descriptions of individual objects. Fuse determines a set of visually salient words and phrases and associates them to a set of visual features. Given a new scene, Fuse uses the acquired knowledge to generate class-based language models conditioned on the objects present in the scene as well as a spatial language model that predicts the occurrences of spatial terms conditioned on target and landmark objects. The speech recognizer in Fuse uses a weighted mixture of these language models to search for more likely interpretations of user speech in context of the current scene. During decoding, the weights are updated using a visual attention model which redistributes attention over objects based on partially decoded utterances. The dynamic situationally-aware language models enable Fuse to jointly infer spoken language utterances underlying speech signals as well as the identities of target objects they refer to. In an evaluation of the system, visual situationally-aware language modeling shows significant , more than 30 %, decrease in speech recognition and understanding error rates. The underlying ideas of situation-aware speech understanding that have been developed in Fuse may may be applied in numerous areas including assistive and mobile human-machine interfaces.by Niloy Mukherjee.S.M
    • …
    corecore