1,187 research outputs found
Zero-Shot Learning for Semantic Utterance Classification
We propose a novel zero-shot learning method for semantic utterance
classification (SUC). It learns a classifier for problems where
none of the semantic categories are present in the training set. The
framework uncovers the link between categories and utterances using a semantic
space. We show that this semantic space can be learned by deep neural networks
trained on large amounts of search engine query log data. More precisely, we
propose a novel method that can learn discriminative semantic features without
supervision. It uses the zero-shot learning framework to guide the learning of
the semantic features. We demonstrate the effectiveness of the zero-shot
semantic learning algorithm on the SUC dataset collected by (Tur, 2012).
Furthermore, we achieve state-of-the-art results by combining the semantic
features with a supervised method
Towards Zero-Shot Frame Semantic Parsing for Domain Scaling
State-of-the-art slot filling models for goal-oriented human/machine
conversational language understanding systems rely on deep learning methods.
While multi-task training of such models alleviates the need for large
in-domain annotated datasets, bootstrapping a semantic parsing model for a new
domain using only the semantic frame, such as the back-end API or knowledge
graph schema, is still one of the holy grail tasks of language understanding
for dialogue systems. This paper proposes a deep learning based approach that
can utilize only the slot description in context without the need for any
labeled or unlabeled in-domain examples, to quickly bootstrap a new domain. The
main idea of this paper is to leverage the encoding of the slot names and
descriptions within a multi-task deep learned slot filling model, to implicitly
align slots across domains. The proposed approach is promising for solving the
domain scaling problem and eliminating the need for any manually annotated data
or explicit schema alignment. Furthermore, our experiments on multiple domains
show that this approach results in significantly better slot-filling
performance when compared to using only in-domain data, especially in the low
data regime.Comment: 4 pages + 1 reference
A neural network approach to audio-assisted movie dialogue detection
A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%
Sequential Dialogue Context Modeling for Spoken Language Understanding
Spoken Language Understanding (SLU) is a key component of goal oriented
dialogue systems that would parse user utterances into semantic frame
representations. Traditionally SLU does not utilize the dialogue history beyond
the previous system turn and contextual ambiguities are resolved by the
downstream components. In this paper, we explore novel approaches for modeling
dialogue context in a recurrent neural network (RNN) based language
understanding system. We propose the Sequential Dialogue Encoder Network, that
allows encoding context from the dialogue history in chronological order. We
compare the performance of our proposed architecture with two context models,
one that uses just the previous turn context and another that encodes dialogue
context in a memory network, but loses the order of utterances in the dialogue
history. Experiments with a multi-domain dialogue dataset demonstrate that the
proposed architecture results in reduced semantic frame error rates.Comment: 8 + 2 pages, Updated 10/17: Updated typos in abstract, Updated 07/07:
Updated Title, abstract and few minor change
Multi-View Zero-Shot Open Intent Induction from Dialogues: Multi Domain Batch and Proxy Gradient Transfer
In Task Oriented Dialogue (TOD) system, detecting and inducing new intents
are two main challenges to apply the system in the real world. In this paper,
we suggest the semantic multi-view model to resolve these two challenges: (1)
SBERT for General Embedding (GE), (2) Multi Domain Batch (MDB) for dialogue
domain knowledge, and (3) Proxy Gradient Transfer (PGT) for cluster-specialized
semantic. MDB feeds diverse dialogue datasets to the model at once to tackle
the multi-domain problem by learning the multiple domain knowledge. We
introduce a novel method PGT, which employs the Siamese network to fine-tune
the model with a clustering method directly.Our model can learn how to cluster
dialogue utterances by using PGT. Experimental results demonstrate that our
multi-view model with MDB and PGT significantly improves the Open Intent
Induction performance compared to baseline systems.Comment: 8 pages, 3 figures, SIGDIAL DSTC 2023 worksho
IntenDD: A Unified Contrastive Learning Approach for Intent Detection and Discovery
Identifying intents from dialogue utterances forms an integral component of
task-oriented dialogue systems. Intent-related tasks are typically formulated
either as a classification task, where the utterances are classified into
predefined categories or as a clustering task when new and previously unknown
intent categories need to be discovered from these utterances. Further, the
intent classification may be modeled in a multiclass (MC) or multilabel (ML)
setup. While typically these tasks are modeled as separate tasks, we propose
IntenDD, a unified approach leveraging a shared utterance encoding backbone.
IntenDD uses an entirely unsupervised contrastive learning strategy for
representation learning, where pseudo-labels for the unlabeled utterances are
generated based on their lexical features. Additionally, we introduce a
two-step post-processing setup for the classification tasks using modified
adsorption. Here, first, the residuals in the training data are propagated
followed by smoothing the labels both modeled in a transductive setting.
Through extensive evaluations on various benchmark datasets, we find that our
approach consistently outperforms competitive baselines across all three tasks.
On average, IntenDD reports percentage improvements of 2.32%, 1.26%, and 1.52%
in their respective metrics for few-shot MC, few-shot ML, and the intent
discovery tasks respectively.Comment: EMNLP 2023 Finding
- …