582 research outputs found
Zero-shot User Intent Detection via Capsule Neural Networks
User intent detection plays a critical role in question-answering and dialog
systems. Most previous works treat intent detection as a classification problem
where utterances are labeled with predefined intents. However, it is
labor-intensive and time-consuming to label users' utterances as intents are
diversely expressed and novel intents will continually be involved. Instead, we
study the zero-shot intent detection problem, which aims to detect emerging
user intents where no labeled utterances are currently available. We propose
two capsule-based architectures: INTENT-CAPSNET that extracts semantic features
from utterances and aggregates them to discriminate existing intents, and
INTENTCAPSNET-ZSL which gives INTENTCAPSNET the zero-shot learning ability to
discriminate emerging intents via knowledge transfer from existing intents.
Experiments on two real-world datasets show that our model not only can better
discriminate diversely expressed existing intents, but is also able to
discriminate emerging intents when no labeled utterances are available.Comment: In EMNLP 2018 as a long paper. Previously available on
http://doi.org/10.13140/RG.2.2.11739.4688
Joint Slot Filling and Intent Detection via Capsule Neural Networks
Being able to recognize words as slots and detect the intent of an utterance
has been a keen issue in natural language understanding. The existing works
either treat slot filling and intent detection separately in a pipeline manner,
or adopt joint models which sequentially label slots while summarizing the
utterance-level intent without explicitly preserving the hierarchical
relationship among words, slots, and intents. To exploit the semantic hierarchy
for effective modeling, we propose a capsule-based neural network model which
accomplishes slot filling and intent detection via a dynamic
routing-by-agreement schema. A re-routing schema is proposed to further
synergize the slot filling performance using the inferred intent
representation. Experiments on two real-world datasets show the effectiveness
of our model when compared with other alternative model architectures, as well
as existing natural language understanding services.Comment: In ACL 2019 as a long paper. Code and data available at
https://github.com/czhang99/Capsule-NL
Induction Networks for Few-Shot Text Classification
Text classification tends to struggle when data is deficient or when it needs
to adapt to unseen classes. In such challenging scenarios, recent studies have
used meta-learning to simulate the few-shot task, in which new queries are
compared to a small support set at the sample-wise level. However, this
sample-wise comparison may be severely disturbed by the various expressions in
the same class. Therefore, we should be able to learn a general representation
of each class in the support set and then compare it to new queries. In this
paper, we propose a novel Induction Network to learn such a generalized
class-wise representation, by innovatively leveraging the dynamic routing
algorithm in meta-learning. In this way, we find the model is able to induce
and generalize better. We evaluate the proposed model on a well-studied
sentiment classification dataset (English) and a real-world dialogue intent
classification dataset (Chinese). Experiment results show that on both
datasets, the proposed model significantly outperforms the existing
state-of-the-art approaches, proving the effectiveness of class-wise
generalization in few-shot text classification.Comment: 7 pages, 3 figure
Learning Class-Transductive Intent Representations for Zero-shot Intent Detection
Zero-shot intent detection (ZSID) aims to deal with the continuously emerging
intents without annotated training data. However, existing ZSID systems suffer
from two limitations: 1) They are not good at modeling the relationship between
seen and unseen intents. 2) They cannot effectively recognize unseen intents
under the generalized intent detection (GZSID) setting. A critical problem
behind these limitations is that the representations of unseen intents cannot
be learned in the training stage. To address this problem, we propose a novel
framework that utilizes unseen class labels to learn Class-Transductive Intent
Representations (CTIR). Specifically, we allow the model to predict unseen
intents during training, with the corresponding label names serving as input
utterances. On this basis, we introduce a multi-task learning objective, which
encourages the model to learn the distinctions among intents, and a similarity
scorer, which estimates the connections among intents more accurately. CTIR is
easy to implement and can be integrated with existing methods. Experiments on
two real-world datasets show that CTIR brings considerable improvement to the
baseline systems.Comment: IJCAI-202
Joint Training Capsule Network for Cold Start Recommendation
This paper proposes a novel neural network, joint training capsule network
(JTCN), for the cold start recommendation task. We propose to mimic the
high-level user preference other than the raw interaction history based on the
side information for the fresh users. Specifically, an attentive capsule layer
is proposed to aggregate high-level user preference from the low-level
interaction history via a dynamic routing-by-agreement mechanism. Moreover,
JTCN jointly trains the loss for mimicking the user preference and the softmax
loss for the recommendation together in an end-to-end manner. Experiments on
two publicly available datasets demonstrate the effectiveness of the proposed
model. JTCN improves other state-of-the-art methods at least 7.07% for
CiteULike and 16.85% for Amazon in terms of Recall@100 in cold start
recommendation.Comment: Accepted by SIGIR'2
Towards Open Intent Discovery for Conversational Text
Detecting and identifying user intent from text, both written and spoken,
plays an important role in modelling and understand dialogs. Existing research
for intent discovery model it as a classification task with a predefined set of
known categories. To generailze beyond these preexisting classes, we define a
new task of \textit{open intent discovery}. We investigate how intent can be
generalized to those not seen during training. To this end, we propose a
two-stage approach to this task - predicting whether an utterance contains an
intent, and then tagging the intent in the input utterance. Our model consists
of a bidirectional LSTM with a CRF on top to capture contextual semantics,
subject to some constraints. Self-attention is used to learn long distance
dependencies. Further, we adapt an adversarial training approach to improve
robustness and perforamce across domains. We also present a dataset of 25k
real-life utterances that have been labelled via crowd sourcing. Our
experiments across different domains and real-world datasets show the
effectiveness of our approach, with less than 100 annotated examples needed per
unique domain to recognize diverse intents. The approach outperforms
state-of-the-art baselines by 5-15% F1 score points
A Survey on Spoken Language Understanding: Recent Advances and New Frontiers
Spoken Language Understanding (SLU) aims to extract the semantics frame of
user queries, which is a core component in a task-oriented dialog system. With
the burst of deep neural networks and the evolution of pre-trained language
models, the research of SLU has obtained significant breakthroughs. However,
there remains a lack of a comprehensive survey summarizing existing approaches
and recent trends, which motivated the work presented in this article. In this
paper, we survey recent advances and new frontiers in SLU. Specifically, we
give a thorough review of this research field, covering different aspects
including (1) new taxonomy: we provide a new perspective for SLU filed,
including single model vs. joint model, implicit joint modeling vs. explicit
joint modeling in joint model, non pre-trained paradigm vs. pre-trained
paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the
corresponding challenges; (3) abundant open-source resources: to help the
community, we have collected, organized the related papers, baseline projects
and leaderboard on a public website where SLU researchers could directly access
to the recent progress. We hope that this survey can shed a light on future
research in SLU field.Comment: Survey for SLU Direction. Resources in
\url{https://github.com/yizhen20133868/Awesome-SLU-Survey
Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection
Detecting the user's intent and finding the corresponding slots among the
utterance's words are important tasks in natural language understanding. Their
interconnected nature makes their joint modeling a standard part of training
such models. Moreover, data scarceness and specialized vocabularies pose
additional challenges. Recently, the advances in pre-trained language models,
namely contextualized models such as ELMo and BERT have revolutionized the
field by tapping the potential of training very large models with just a few
steps of fine-tuning on a task-specific dataset. Here, we leverage such model,
namely BERT, and we design a novel architecture on top it. Moreover, we propose
an intent pooling attention mechanism, and we reinforce the slot filling task
by fusing intent distributions, word features, and token representations. The
experimental results on standard datasets show that our model outperforms both
the current non-BERT state of the art as well as some stronger BERT-based
baselines
Robust Zero-Shot Cross-Domain Slot Filling with Example Values
Task-oriented dialog systems increasingly rely on deep learning-based slot
filling models, usually needing extensive labeled training data for target
domains. Often, however, little to no target domain training data may be
available, or the training and target domain schemas may be misaligned, as is
common for web forms on similar websites. Prior zero-shot slot filling models
use slot descriptions to learn concepts, but are not robust to misaligned
schemas. We propose utilizing both the slot description and a small number of
examples of slot values, which may be easily available, to learn semantic
representations of slots which are transferable across domains and robust to
misaligned schemas. Our approach outperforms state-of-the-art models on two
multi-domain datasets, especially in the low-data setting.Comment: To appear in ACL 201
Detecting Fake News with Capsule Neural Networks
Fake news is dramatically increased in social media in recent years. This has
prompted the need for effective fake news detection algorithms. Capsule neural
networks have been successful in computer vision and are receiving attention
for use in Natural Language Processing (NLP). This paper aims to use capsule
neural networks in the fake news detection task. We use different embedding
models for news items of different lengths. Static word embedding is used for
short news items, whereas non-static word embeddings that allow incremental
up-training and updating in the training phase are used for medium length or
large news statements. Moreover, we apply different levels of n-grams for
feature extraction. Our proposed architectures are evaluated on two recent
well-known datasets in the field, namely ISOT and LIAR. The results show
encouraging performance, outperforming the state-of-the-art methods by 7.8% on
ISOT and 3.1% on the validation set, and 1% on the test set of the LIAR
dataset.Comment: 25 pages, 4 figure
- …