Search CORE

11,004 research outputs found

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

Author: Dai Lirong
Du Jun
Guo Wu
You Lanhua
Publication venue
Publication date: 04/04/2019
Field of study

In this paper, gating mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, a gated convolution neural network (GCNN) is employed for modeling the frame-level embedding layers. Compared with the time-delay DNN (TDNN), the GCNN can obtain more expressive frame-level representations through carefully designed memory cell and gating mechanisms. Moreover, we propose a novel gated-attention statistics pooling strategy in which the attention scores are shared with the output gate. The gated-attention statistics pooling combines both gating and attention mechanisms into one framework; therefore, we can capture more useful information in the temporal pooling layer. Experiments are carried out using the NIST SRE16 and SRE18 evaluation datasets. The results demonstrate the effectiveness of the GCNN and show that the proposed gated-attention statistics pooling can further improve the performance.Comment: 5 pages, 3 figures, submitted to INTERSPEECH 201

arXiv.org e-Print Archive

A Self-Attention Joint Model for Spoken Language Understanding in Situational Dialog Applications

Author: Chen Mengyang
Lou Jie
Zeng Jin
Publication venue
Publication date: 27/05/2019
Field of study

Spoken language understanding (SLU) acts as a critical component in goal-oriented dialog systems. It typically involves identifying the speakers intent and extracting semantic slots from user utterances, which are known as intent detection (ID) and slot filling (SF). SLU problem has been intensively investigated in recent years. However, these methods just constrain SF results grammatically, solve ID and SF independently, or do not fully utilize the mutual impact of the two tasks. This paper proposes a multi-head self-attention joint model with a conditional random field (CRF) layer and a prior mask. The experiments show the effectiveness of our model, as compared with state-of-the-art models. Meanwhile, online education in China has made great progress in the last few years. But there are few intelligent educational dialog applications for students to learn foreign languages. Hence, we design an intelligent dialog robot equipped with different scenario settings to help students learn communication skills

arXiv.org e-Print Archive

Multi-Domain Spoken Language Understanding Using Domain- and Task-Aware Parameterization

Author: Che Wanxiang
Li Yangming
Liu Ting
Ni Minheng
Qin Libo
Zhang Yue
Publication venue
Publication date: 28/11/2021
Field of study

Spoken language understanding has been addressed as a supervised learning problem, where a set of training data is available for each domain. However, annotating data for each domain is both financially costly and non-scalable so we should fully utilize information across all domains. One existing approach solves the problem by conducting multi-domain learning, using shared parameters for joint training across domains. We propose to improve the parameterization of this method by using domain-specific and task-specific model parameters to improve knowledge learning and transfer. Experiments on 5 domains show that our model is more effective for multi-domain SLU and obtain the best results. In addition, we show its transferability by outperforming the prior best model by 12.4\% when adapting to a new domain with little data.Comment: Accepted by Transactions on Asian and Low-Resource Language Information Processing (TALLIP

arXiv.org e-Print Archive

Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog

Author: Che Wanxiang
Liu Ting
Qin Libo
Xu Xiao
Zhang Yue
Publication venue
Publication date: 11/06/2020
Field of study

Recent studies have shown remarkable success in end-to-end task-oriented dialog system. However, most neural models rely on large training data, which are only available for a certain number of task domains, such as navigation and scheduling. This makes it difficult to scalable for a new domain with limited labeled data. However, there has been relatively little research on how to effectively use data from all domains to improve the performance of each domain and also unseen domains. To this end, we investigate methods that can make explicit use of domain knowledge and introduce a shared-private network to learn shared and specific knowledge. In addition, we propose a novel Dynamic Fusion Network (DF-Net) which automatically exploit the relevance between the target domain and each domain. Results show that our model outperforms existing methods on multi-domain dialogue, giving the state-of-the-art in the literature. Besides, with little training data, we show its transferability by outperforming prior best model by 13.9\% on average.Comment: ACL202

arXiv.org e-Print Archive

CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots

Author: Diab Mona
Gupta Arshit
Lalwani Garima
Zhang Peng
Publication venue
Publication date: 18/09/2019
Field of study

Natural Language Understanding (NLU) is a core component of dialog systems. It typically involves two tasks - intent classification (IC) and slot labeling (SL), which are then followed by a dialogue management (DM) component. Such NLU systems cater to utterances in isolation, thus pushing the problem of context management to DM. However, contextual information is critical to the correct prediction of intents and slots in a conversation. Prior work on contextual NLU has been limited in terms of the types of contextual signals used and the understanding of their impact on the model. In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance. CASA-NLU outperforms a recurrent contextual NLU baseline on two conversational datasets, yielding a gain of up to 7% on the IC task for one of the datasets. Moreover, a non-contextual variant of CASA-NLU achieves state-of-the-art performance for IC task on standard public datasets - Snips and ATIS.Comment: To appear at EMNLP 201

arXiv.org e-Print Archive

Injecting Word Information with Multi-Level Word Adapter for Chinese Spoken Language Understanding

Author: Che Wanxiang
Liu Ting
Qin Libo
Teng Dechuang
Zhao Sendong
Publication venue
Publication date: 17/02/2021
Field of study

In this paper, we improve Chinese spoken language understanding (SLU) by injecting word information. Previous studies on Chinese SLU do not consider the word information, failing to detect word boundaries that are beneficial for intent detection and slot filling. To address this issue, we propose a multi-level word adapter to inject word information for Chinese SLU, which consists of (1) sentence-level word adapter, which directly fuses the sentence representations of the word information and character information to perform intent detection and (2) character-level word adapter, which is applied at each character for selectively controlling weights on word information as well as character information. Experimental results on two Chinese SLU datasets show that our model can capture useful word information and achieve state-of-the-art performance.Comment: Accepted at ICASSP 202

arXiv.org e-Print Archive

PIN: A Novel Parallel Interactive Network for Spoken Language Understanding

Author: Huang Zhiqi
Liu Fenglin
Zhou Peilin
Zou Yuexian
Publication venue
Publication date: 28/09/2020
Field of study

Spoken Language Understanding (SLU) is an essential part of the spoken dialogue system, which typically consists of intent detection (ID) and slot filling (SF) tasks. Recently, recurrent neural networks (RNNs) based methods achieved the state-of-the-art for SLU. It is noted that, in the existing RNN-based approaches, ID and SF tasks are often jointly modeled to utilize the correlation information between them. However, we noted that, so far, the efforts to obtain better performance by supporting bidirectional and explicit information exchange between ID and SF are not well studied.In addition, few studies attempt to capture the local context information to enhance the performance of SF. Motivated by these findings, in this paper, Parallel Interactive Network (PIN) is proposed to model the mutual guidance between ID and SF. Specifically, given an utterance, a Gaussian self-attentive encoder is introduced to generate the context-aware feature embedding of the utterance which is able to capture local context information. Taking the feature embedding of the utterance, Slot2Intent module and Intent2Slot module are developed to capture the bidirectional information flow for ID and SF tasks. Finally, a cooperation mechanism is constructed to fuse the information obtained from Slot2Intent and Intent2Slot modules to further reduce the prediction bias.The experiments on two benchmark datasets, i.e., SNIPS and ATIS, demonstrate the effectiveness of our approach, which achieves a competitive result with state-of-the-art models. More encouragingly, by using the feature embedding of the utterance generated by the pre-trained language model BERT, our method achieves the state-of-the-art among all comparison approaches

arXiv.org e-Print Archive

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection

Author: Che Wanxiang
Kang Bingbing
Liu Tailu
Liu Ting
Qin Libo
Zhao Sendong
Publication venue
Publication date: 08/03/2021
Field of study

Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system. The two tasks are closely related and the information of one task can be utilized in the other task. Previous studies either model the two tasks separately or only consider the single information flow from intent to slot. None of the prior approaches model the bidirectional connection between the two tasks simultaneously. In this paper, we propose a Co-Interactive Transformer to consider the cross-impact between the two tasks. Instead of adopting the self-attention mechanism in vanilla Transformer, we propose a co-interactive module to consider the cross-impact by building a bidirectional connection between the two related tasks. In addition, the proposed co-interactive module can be stacked to incrementally enhance each other with mutual features. The experimental results on two public datasets (SNIPS and ATIS) show that our model achieves the state-of-the-art performance with considerable improvements (+3.4% and +0.9% on overall acc). Extensive experiments empirically verify that our model successfully captures the mutual interaction knowledge.Comment: Accepted at ICASSP 202

arXiv.org e-Print Archive

Joint Intent Detection and Slot Filling with Wheel-Graph Attention Networks

Author: Liao Wenxiong
Wei Pengfei
Zeng Bi
Publication venue
Publication date: 08/02/2021
Field of study

Intent detection and slot filling are two fundamental tasks for building a spoken language understanding (SLU) system. Multiple deep learning-based joint models have demonstrated excellent results on the two tasks. In this paper, we propose a new joint model with a wheel-graph attention network (Wheel-GAT) which is able to model interrelated connections directly for intent detection and slot filling. To construct a graph structure for utterances, we create intent nodes, slot nodes, and directed edges. Intent nodes can provide utterance-level semantic information for slot filling, while slot nodes can also provide local keyword information for intent. Experiments show that our model outperforms multiple baselines on two public datasets. Besides, we also demonstrate that using Bidirectional Encoder Representation from Transformer (BERT) model further boosts the performance in the SLU task

arXiv.org e-Print Archive

Towards Open Intent Discovery for Conversational Text

Author: Lipka Nedim
Maneriker Pranav
Parthasarathy Srinivasan
Vedula Nikhita
Publication venue
Publication date: 17/04/2019
Field of study

Detecting and identifying user intent from text, both written and spoken, plays an important role in modelling and understand dialogs. Existing research for intent discovery model it as a classification task with a predefined set of known categories. To generailze beyond these preexisting classes, we define a new task of \textit{open intent discovery}. We investigate how intent can be generalized to those not seen during training. To this end, we propose a two-stage approach to this task - predicting whether an utterance contains an intent, and then tagging the intent in the input utterance. Our model consists of a bidirectional LSTM with a CRF on top to capture contextual semantics, subject to some constraints. Self-attention is used to learn long distance dependencies. Further, we adapt an adversarial training approach to improve robustness and perforamce across domains. We also present a dataset of 25k real-life utterances that have been labelled via crowd sourcing. Our experiments across different domains and real-world datasets show the effectiveness of our approach, with less than 100 annotated examples needed per unique domain to recognize diverse intents. The approach outperforms state-of-the-art baselines by 5-15% F1 score points

arXiv.org e-Print Archive