11,004 research outputs found
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
In this paper, gating mechanisms are applied in deep neural network (DNN)
training for x-vector-based text-independent speaker verification. First, a
gated convolution neural network (GCNN) is employed for modeling the
frame-level embedding layers. Compared with the time-delay DNN (TDNN), the GCNN
can obtain more expressive frame-level representations through carefully
designed memory cell and gating mechanisms. Moreover, we propose a novel
gated-attention statistics pooling strategy in which the attention scores are
shared with the output gate. The gated-attention statistics pooling combines
both gating and attention mechanisms into one framework; therefore, we can
capture more useful information in the temporal pooling layer. Experiments are
carried out using the NIST SRE16 and SRE18 evaluation datasets. The results
demonstrate the effectiveness of the GCNN and show that the proposed
gated-attention statistics pooling can further improve the performance.Comment: 5 pages, 3 figures, submitted to INTERSPEECH 201
A Self-Attention Joint Model for Spoken Language Understanding in Situational Dialog Applications
Spoken language understanding (SLU) acts as a critical component in
goal-oriented dialog systems. It typically involves identifying the speakers
intent and extracting semantic slots from user utterances, which are known as
intent detection (ID) and slot filling (SF). SLU problem has been intensively
investigated in recent years. However, these methods just constrain SF results
grammatically, solve ID and SF independently, or do not fully utilize the
mutual impact of the two tasks. This paper proposes a multi-head self-attention
joint model with a conditional random field (CRF) layer and a prior mask. The
experiments show the effectiveness of our model, as compared with
state-of-the-art models. Meanwhile, online education in China has made great
progress in the last few years. But there are few intelligent educational
dialog applications for students to learn foreign languages. Hence, we design
an intelligent dialog robot equipped with different scenario settings to help
students learn communication skills
Multi-Domain Spoken Language Understanding Using Domain- and Task-Aware Parameterization
Spoken language understanding has been addressed as a supervised learning
problem, where a set of training data is available for each domain. However,
annotating data for each domain is both financially costly and non-scalable so
we should fully utilize information across all domains. One existing approach
solves the problem by conducting multi-domain learning, using shared parameters
for joint training across domains. We propose to improve the parameterization
of this method by using domain-specific and task-specific model parameters to
improve knowledge learning and transfer. Experiments on 5 domains show that our
model is more effective for multi-domain SLU and obtain the best results. In
addition, we show its transferability by outperforming the prior best model by
12.4\% when adapting to a new domain with little data.Comment: Accepted by Transactions on Asian and Low-Resource Language
Information Processing (TALLIP
Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog
Recent studies have shown remarkable success in end-to-end task-oriented
dialog system. However, most neural models rely on large training data, which
are only available for a certain number of task domains, such as navigation and
scheduling.
This makes it difficult to scalable for a new domain with limited labeled
data. However, there has been relatively little research on how to effectively
use data from all domains to improve the performance of each domain and also
unseen domains. To this end, we investigate methods that can make explicit use
of domain knowledge and introduce a shared-private network to learn shared and
specific knowledge. In addition, we propose a novel Dynamic Fusion Network
(DF-Net) which automatically exploit the relevance between the target domain
and each domain. Results show that our model outperforms existing methods on
multi-domain dialogue, giving the state-of-the-art in the literature. Besides,
with little training data, we show its transferability by outperforming prior
best model by 13.9\% on average.Comment: ACL202
CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots
Natural Language Understanding (NLU) is a core component of dialog systems.
It typically involves two tasks - intent classification (IC) and slot labeling
(SL), which are then followed by a dialogue management (DM) component. Such NLU
systems cater to utterances in isolation, thus pushing the problem of context
management to DM. However, contextual information is critical to the correct
prediction of intents and slots in a conversation. Prior work on contextual NLU
has been limited in terms of the types of contextual signals used and the
understanding of their impact on the model. In this work, we propose a
context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals,
such as previous intents, slots, dialog acts and utterances over a variable
context window, in addition to the current user utterance. CASA-NLU outperforms
a recurrent contextual NLU baseline on two conversational datasets, yielding a
gain of up to 7% on the IC task for one of the datasets. Moreover, a
non-contextual variant of CASA-NLU achieves state-of-the-art performance for IC
task on standard public datasets - Snips and ATIS.Comment: To appear at EMNLP 201
Injecting Word Information with Multi-Level Word Adapter for Chinese Spoken Language Understanding
In this paper, we improve Chinese spoken language understanding (SLU) by
injecting word information. Previous studies on Chinese SLU do not consider the
word information, failing to detect word boundaries that are beneficial for
intent detection and slot filling. To address this issue, we propose a
multi-level word adapter to inject word information for Chinese SLU, which
consists of (1) sentence-level word adapter, which directly fuses the sentence
representations of the word information and character information to perform
intent detection and (2) character-level word adapter, which is applied at each
character for selectively controlling weights on word information as well as
character information. Experimental results on two Chinese SLU datasets show
that our model can capture useful word information and achieve state-of-the-art
performance.Comment: Accepted at ICASSP 202
PIN: A Novel Parallel Interactive Network for Spoken Language Understanding
Spoken Language Understanding (SLU) is an essential part of the spoken
dialogue system, which typically consists of intent detection (ID) and slot
filling (SF) tasks. Recently, recurrent neural networks (RNNs) based methods
achieved the state-of-the-art for SLU. It is noted that, in the existing
RNN-based approaches, ID and SF tasks are often jointly modeled to utilize the
correlation information between them. However, we noted that, so far, the
efforts to obtain better performance by supporting bidirectional and explicit
information exchange between ID and SF are not well studied.In addition, few
studies attempt to capture the local context information to enhance the
performance of SF. Motivated by these findings, in this paper, Parallel
Interactive Network (PIN) is proposed to model the mutual guidance between ID
and SF. Specifically, given an utterance, a Gaussian self-attentive encoder is
introduced to generate the context-aware feature embedding of the utterance
which is able to capture local context information. Taking the feature
embedding of the utterance, Slot2Intent module and Intent2Slot module are
developed to capture the bidirectional information flow for ID and SF tasks.
Finally, a cooperation mechanism is constructed to fuse the information
obtained from Slot2Intent and Intent2Slot modules to further reduce the
prediction bias.The experiments on two benchmark datasets, i.e., SNIPS and
ATIS, demonstrate the effectiveness of our approach, which achieves a
competitive result with state-of-the-art models. More encouragingly, by using
the feature embedding of the utterance generated by the pre-trained language
model BERT, our method achieves the state-of-the-art among all comparison
approaches
A Co-Interactive Transformer for Joint Slot Filling and Intent Detection
Intent detection and slot filling are two main tasks for building a spoken
language understanding (SLU) system. The two tasks are closely related and the
information of one task can be utilized in the other task. Previous studies
either model the two tasks separately or only consider the single information
flow from intent to slot. None of the prior approaches model the bidirectional
connection between the two tasks simultaneously. In this paper, we propose a
Co-Interactive Transformer to consider the cross-impact between the two tasks.
Instead of adopting the self-attention mechanism in vanilla Transformer, we
propose a co-interactive module to consider the cross-impact by building a
bidirectional connection between the two related tasks. In addition, the
proposed co-interactive module can be stacked to incrementally enhance each
other with mutual features. The experimental results on two public datasets
(SNIPS and ATIS) show that our model achieves the state-of-the-art performance
with considerable improvements (+3.4% and +0.9% on overall acc). Extensive
experiments empirically verify that our model successfully captures the mutual
interaction knowledge.Comment: Accepted at ICASSP 202
Joint Intent Detection and Slot Filling with Wheel-Graph Attention Networks
Intent detection and slot filling are two fundamental tasks for building a
spoken language understanding (SLU) system. Multiple deep learning-based joint
models have demonstrated excellent results on the two tasks. In this paper, we
propose a new joint model with a wheel-graph attention network (Wheel-GAT)
which is able to model interrelated connections directly for intent detection
and slot filling. To construct a graph structure for utterances, we create
intent nodes, slot nodes, and directed edges. Intent nodes can provide
utterance-level semantic information for slot filling, while slot nodes can
also provide local keyword information for intent. Experiments show that our
model outperforms multiple baselines on two public datasets. Besides, we also
demonstrate that using Bidirectional Encoder Representation from Transformer
(BERT) model further boosts the performance in the SLU task
Towards Open Intent Discovery for Conversational Text
Detecting and identifying user intent from text, both written and spoken,
plays an important role in modelling and understand dialogs. Existing research
for intent discovery model it as a classification task with a predefined set of
known categories. To generailze beyond these preexisting classes, we define a
new task of \textit{open intent discovery}. We investigate how intent can be
generalized to those not seen during training. To this end, we propose a
two-stage approach to this task - predicting whether an utterance contains an
intent, and then tagging the intent in the input utterance. Our model consists
of a bidirectional LSTM with a CRF on top to capture contextual semantics,
subject to some constraints. Self-attention is used to learn long distance
dependencies. Further, we adapt an adversarial training approach to improve
robustness and perforamce across domains. We also present a dataset of 25k
real-life utterances that have been labelled via crowd sourcing. Our
experiments across different domains and real-world datasets show the
effectiveness of our approach, with less than 100 annotated examples needed per
unique domain to recognize diverse intents. The approach outperforms
state-of-the-art baselines by 5-15% F1 score points
- …