370 research outputs found
Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems
This work investigates the embeddings for representing dialog history in
spoken language understanding (SLU) systems. We focus on the scenario when the
semantic information is extracted directly from the speech signal by means of a
single end-to-end neural network model. We proposed to integrate dialogue
history into an end-to-end signal-to-concept SLU system. The dialog history is
represented in the form of dialog history embedding vectors (so-called
h-vectors) and is provided as an additional information to end-to-end SLU
models in order to improve the system performance. Three following types of
h-vectors are proposed and experimentally evaluated in this paper: (1)
supervised-all embeddings predicting bag-of-concepts expected in the answer of
the user from the last dialog system response; (2) supervised-freq embeddings
focusing on predicting only a selected set of semantic concept (corresponding
to the most frequent errors in our experiments); and (3) unsupervised
embeddings. Experiments on the MEDIA corpus for the semantic slot filling task
demonstrate that the proposed h-vectors improve the model performance.Comment: Accepted for ICASSP 2020 (Submitted: October 21, 2019
Recommended from our members
Data-Driven Policy Optimisation for Multi-Domain Task-Oriented Dialogue
Recent developments in machine learning along with a general shift in the public attitude towards digital personal assistants has opened new frontiers for conversational systems. Nevertheless, building data-driven multi-domain conversational agents that act optimally given a dialogue context is an open challenge. The first step towards that goal is developing an efficient way of learning a dialogue policy in new domains. Secondly, it is important to have the ability to collect and utilise human-human conversational data to bootstrap an agent's knowledge. The work presented in this thesis demonstrates how a neural dialogue manager fine-tuned with reinforcement learning presents a viable approach for learning a dialogue policy efficiently and across many domains.
The thesis starts by introducing a dialogue management module that learns through interactions to act optimally given a current context of a conversation. The current shift towards neural, parameter-rich systems does not fully address the problem of error noise coming from speech recognition or natural language understanding components. A Bayesian approach is therefore proposed to learn more robust and effective policy management in direct interactions without any prior data. By putting a distribution over model weights, the learning agent is less prone to overfit to particular dialogue realizations and a more efficient exploration policy can be therefore employed. The results show that deep reinforcement learning performs on par with non-parametric models even in a low data regime while significantly reducing the computational complexity compared with the previous state-of-the-art.
The deployment of a dialogue manager without any pre-training on human conversations is not a viable option from an industry perspective. However, the progress in building statistical systems, particularly dialogue managers, is hindered by the scale of data available. To address this fundamental obstacle, a novel data-collection pipeline entirely based on crowdsourcing without the need for hiring professional annotators is introduced. The validation of the approach results in the collection of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully labeled collection of human-human written conversations spanning over multiple domains and topics. The proposed dataset creates a set of new benchmarks (belief tracking, policy optimisation, and response generation) significantly raising the complexity of analysed dialogues.
The collected dataset serves as a foundation for a novel reinforcement learning (RL)-based approach for training a multi-domain dialogue manager. A Multi-Action and Slot Dialogue Agent (MASDA) is proposed to combat some limitations: 1) handling complex multi-domain dialogues with multiple concurrent actions present in a single turn; and 2) lack of interpretability, which consequently impedes the use of intermediate signals (e.g., dialogue turn annotations) if such signals are available. MASDA explicitly models system acts and slots using intermediate signals, resulting in an improved task-based end-to-end framework. The model can also select concurrent actions in a single turn, thus enriching the representation of the generated responses. The proposed framework allows for RL training of dialogue task completion metrics when dealing with concurrent actions. The results demonstrate the advantages of both 1) handling concurrent actions and 2) exploiting intermediate signals: MASDA outperforms previous end-to-end frameworks while also offering improved scalability.EPSR
Semantic Interaction in Web-based Retrieval Systems : Adopting Semantic Web Technologies and Social Networking Paradigms for Interacting with Semi-structured Web Data
Existing web retrieval models for exploration and interaction with web data do not take into account semantic information, nor do they allow for new forms of interaction by employing meaningful interaction and navigation metaphors in 2D/3D. This thesis researches means for introducing a semantic dimension into the search and exploration process of web content to enable a significantly positive user experience. Therefore, an inherently dynamic view beyond single concepts and models from semantic information processing, information extraction and human-machine interaction is adopted. Essential tasks for semantic interaction such as semantic annotation, semantic mediation and semantic human-computer interaction were identified and elaborated for two general application scenarios in web retrieval: Web-based Question Answering in a knowledge-based dialogue system and semantic exploration of information spaces in 2D/3D
Automatic Summarization
It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field
Dynamic Extension of ASR Lexicon Using Wikipedia Data
International audienceDespite recent progress in developing Large Vocabulary Continuous Speech Recognition Systems (LVCSR), these systems suffer from Out-Of-Vocabulary words (OOV). In many cases, the OOV words are Proper Nouns (PNs). The correct recognition of PNs is essential for broadcast news, audio indexing, etc. In this article, we address the problem of OOV PN retrieval in the framework of broadcast news LVCSR. We focused on dynamic (document dependent) extension of LVCSR lexicon. To retrieve relevant OOV PNs, we propose to use a very large multipurpose text corpus: Wikipedia. This corpus contains a huge number of PNs. These PNs are grouped in semantically similar classes using word embedding. We use a two-step approach: first, we select OOV PN pertinent classes with a multi-class Deep Neural Network (DNN). Secondly, we rank the OOVs of the selected classes. The experiments on French broadcast news show that the Bi-GRU model outperforms other studied models. Speech recognition experiments demonstrate the effectiveness of the proposed methodology
- …