68 research outputs found
Spoken dialog systems based on online generated stochastic finite-state transducers
This is the author’s version of a work that was accepted for publication in Speech Communication. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Speech Communication 83 (2016) 81–93. DOI 10.1016/j.specom.2016.07.011.In this paper, we present an approach for the development of spoken dialog systems based on the statistical
modelization of the dialog manager. This work focuses on three points: the modelization of the
dialog manager using Stochastic Finite-State Transducers, an unsupervised way to generate training corpora,
and a mechanism to address the problem of coverage that is based on the online generation of
synthetic dialogs. Our proposal has been developed and applied to a sport facilities booking task at the
university. We present experimentation evaluating the system behavior on a set of dialogs that was acquired
using the Wizard of Oz technique as well as experimentation with real users. The experimentation
shows that the method proposed to increase the coverage of the Dialog System was useful to find new
valid paths in the model to achieve the user goals, providing good results with real users.
© 2016 Elsevier B.V. All rights reserved.This work is partially supported by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MINECO TIN2014-54288-C4-3-R).Hurtado Oliver, LF.; Planells Lerma, J.; Segarra Soriano, E.; SanchĂs Arnal, E. (2016). Spoken dialog systems based on online generated stochastic finite-state transducers. Speech Communication. 83:81-93. https://doi.org/10.1016/j.specom.2016.07.011S81938
Towards structured neural spoken dialogue modelling.
195 p.In this thesis, we try to alleviate some of the weaknesses of the current approaches to dialogue modelling,one of the most challenging areas of Artificial Intelligence. We target three different types of dialogues(open-domain, task-oriented and coaching sessions), and use mainly machine learning algorithms to traindialogue models. One challenge of open-domain chatbots is their lack of response variety, which can betackled using Generative Adversarial Networks (GANs). We present two methodological contributions inthis regard. On the one hand, we develop a method to circumvent the non-differentiability of textprocessingGANs. On the other hand, we extend the conventional task of discriminators, which oftenoperate at a single response level, to the batch level. Meanwhile, two crucial aspects of task-orientedsystems are their understanding capabilities because they need to correctly interpret what the user islooking for and their constraints), and the dialogue strategy. We propose a simple yet powerful way toimprove spoken understanding and adapt the dialogue strategy by explicitly processing the user's speechsignal through audio-processing transformer neural networks. Finally, coaching dialogues shareproperties of open-domain and task-oriented dialogues. They are somehow task-oriented but, there is norush to complete the task, and it is more important to calmly converse to make the users aware of theirown problems. In this context, we describe our collaboration in the EMPATHIC project, where a VirtualCoach capable of carrying out coaching dialogues about nutrition was built, using a modular SpokenDialogue System. Second, we model such dialogues with an end-to-end system based on TransferLearning
Building and Exploiting a Corpus of Dialog Interactions between French Speaking Virtual and Human Agents
International audienceWe describe the acquisition of a dialog corpus for French based on multi-task human-machine interactions in a serious game setting. We present a tool for data collection that is configurable for multiple games; describe the data collected using this tool and the annotation schema used to annotate it; and report on the results obtained when training a classifier on the annotated data to associate each player turn with a dialog move usable by a rule based dialog manager. The collected data consists of approximately 1250 dialogs, 10454 utterances and 168509 words and will be made freely available to academic and nonprofit research
Recommended from our members
Data-Driven Policy Optimisation for Multi-Domain Task-Oriented Dialogue
Recent developments in machine learning along with a general shift in the public attitude towards digital personal assistants has opened new frontiers for conversational systems. Nevertheless, building data-driven multi-domain conversational agents that act optimally given a dialogue context is an open challenge. The first step towards that goal is developing an efficient way of learning a dialogue policy in new domains. Secondly, it is important to have the ability to collect and utilise human-human conversational data to bootstrap an agent's knowledge. The work presented in this thesis demonstrates how a neural dialogue manager fine-tuned with reinforcement learning presents a viable approach for learning a dialogue policy efficiently and across many domains.
The thesis starts by introducing a dialogue management module that learns through interactions to act optimally given a current context of a conversation. The current shift towards neural, parameter-rich systems does not fully address the problem of error noise coming from speech recognition or natural language understanding components. A Bayesian approach is therefore proposed to learn more robust and effective policy management in direct interactions without any prior data. By putting a distribution over model weights, the learning agent is less prone to overfit to particular dialogue realizations and a more efficient exploration policy can be therefore employed. The results show that deep reinforcement learning performs on par with non-parametric models even in a low data regime while significantly reducing the computational complexity compared with the previous state-of-the-art.
The deployment of a dialogue manager without any pre-training on human conversations is not a viable option from an industry perspective. However, the progress in building statistical systems, particularly dialogue managers, is hindered by the scale of data available. To address this fundamental obstacle, a novel data-collection pipeline entirely based on crowdsourcing without the need for hiring professional annotators is introduced. The validation of the approach results in the collection of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully labeled collection of human-human written conversations spanning over multiple domains and topics. The proposed dataset creates a set of new benchmarks (belief tracking, policy optimisation, and response generation) significantly raising the complexity of analysed dialogues.
The collected dataset serves as a foundation for a novel reinforcement learning (RL)-based approach for training a multi-domain dialogue manager. A Multi-Action and Slot Dialogue Agent (MASDA) is proposed to combat some limitations: 1) handling complex multi-domain dialogues with multiple concurrent actions present in a single turn; and 2) lack of interpretability, which consequently impedes the use of intermediate signals (e.g., dialogue turn annotations) if such signals are available. MASDA explicitly models system acts and slots using intermediate signals, resulting in an improved task-based end-to-end framework. The model can also select concurrent actions in a single turn, thus enriching the representation of the generated responses. The proposed framework allows for RL training of dialogue task completion metrics when dealing with concurrent actions. The results demonstrate the advantages of both 1) handling concurrent actions and 2) exploiting intermediate signals: MASDA outperforms previous end-to-end frameworks while also offering improved scalability.EPSR
Robust Dialog Management Through A Context-centric Architecture
This dissertation presents and evaluates a method of managing spoken dialog interactions with a robust attention to fulfilling the human user’s goals in the presence of speech recognition limitations. Assistive speech-based embodied conversation agents are computer-based entities that interact with humans to help accomplish a certain task or communicate information via spoken input and output. A challenging aspect of this task involves open dialog, where the user is free to converse in an unstructured manner. With this style of input, the machine’s ability to communicate may be hindered by poor reception of utterances, caused by a user’s inadequate command of a language and/or faults in the speech recognition facilities. Since a speech-based input is emphasized, this endeavor involves the fundamental issues associated with natural language processing, automatic speech recognition and dialog system design. Driven by ContextBased Reasoning, the presented dialog manager features a discourse model that implements mixed-initiative conversation with a focus on the user’s assistive needs. The discourse behavior must maintain a sense of generality, where the assistive nature of the system remains constant regardless of its knowledge corpus. The dialog manager was encapsulated into a speech-based embodied conversation agent platform for prototyping and testing purposes. A battery of user trials was performed on this agent to evaluate its performance as a robust, domain-independent, speech-based interaction entity capable of satisfying the needs of its users
Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora
International audienceThe PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems' capabilities in terms of robustness and portability across languages and domains. A new test set with some adaptation data is prepared for each case: in Italian as an example of a new language, for ticket reservation as an example of a new domain. Finally the work is complemented by the proposition of a new high level semantic annotation scheme well-suited to dialogue data
Students´ language in computer-assisted tutoring of mathematical proofs
Truth and proof are central to mathematics. Proving (or disproving) seemingly simple statements often turns out to be one of the hardest mathematical tasks. Yet, doing proofs is rarely taught in the classroom. Studies on cognitive difficulties in learning to do proofs have shown that pupils and students not only often do not understand or cannot apply basic formal reasoning techniques and do not know how to use formal mathematical language, but, at a far more fundamental level, they also do not understand what it means to prove a statement or even do not see the purpose of proof at all. Since insight into the importance of proof and doing proofs as such cannot be learnt other than by practice, learning support through individualised tutoring is in demand.
This volume presents a part of an interdisciplinary project, set at the intersection of pedagogical science, artificial intelligence, and (computational) linguistics, which investigated issues involved in provisioning computer-based tutoring of mathematical proofs through dialogue in natural language. The ultimate goal in this context, addressing the above-mentioned need for learning support, is to build intelligent automated tutoring systems for mathematical proofs. The research presented here has been focused on the language that students use while interacting with such a system: its linguistic propeties and computational modelling. Contribution is made at three levels: first, an analysis of language phenomena found in students´ input to a (simulated) proof tutoring system is conducted and the variety of students´ verbalisations is quantitatively assessed, second, a general computational processing strategy for informal mathematical language and methods of modelling prominent language phenomena are proposed, and third, the prospects for natural language as an input modality for proof tutoring systems is evaluated based on collected corpora
- …