Search CORE

582 research outputs found

Zero-shot User Intent Detection via Capsule Neural Networks

Author: Chang Yi
Xia Congying
Yan Xiaohui
Yu Philip S.
Zhang Chenwei
Publication venue
Publication date: 02/09/2018
Field of study

User intent detection plays a critical role in question-answering and dialog systems. Most previous works treat intent detection as a classification problem where utterances are labeled with predefined intents. However, it is labor-intensive and time-consuming to label users' utterances as intents are diversely expressed and novel intents will continually be involved. Instead, we study the zero-shot intent detection problem, which aims to detect emerging user intents where no labeled utterances are currently available. We propose two capsule-based architectures: INTENT-CAPSNET that extracts semantic features from utterances and aggregates them to discriminate existing intents, and INTENTCAPSNET-ZSL which gives INTENTCAPSNET the zero-shot learning ability to discriminate emerging intents via knowledge transfer from existing intents. Experiments on two real-world datasets show that our model not only can better discriminate diversely expressed existing intents, but is also able to discriminate emerging intents when no labeled utterances are available.Comment: In EMNLP 2018 as a long paper. Previously available on http://doi.org/10.13140/RG.2.2.11739.4688

arXiv.org e-Print Archive

Joint Slot Filling and Intent Detection via Capsule Neural Networks

Author: Du Nan
Fan Wei
Li Yaliang
Yu Philip S.
Zhang Chenwei
Publication venue
Publication date: 07/07/2019
Field of study

Being able to recognize words as slots and detect the intent of an utterance has been a keen issue in natural language understanding. The existing works either treat slot filling and intent detection separately in a pipeline manner, or adopt joint models which sequentially label slots while summarizing the utterance-level intent without explicitly preserving the hierarchical relationship among words, slots, and intents. To exploit the semantic hierarchy for effective modeling, we propose a capsule-based neural network model which accomplishes slot filling and intent detection via a dynamic routing-by-agreement schema. A re-routing schema is proposed to further synergize the slot filling performance using the inferred intent representation. Experiments on two real-world datasets show the effectiveness of our model when compared with other alternative model architectures, as well as existing natural language understanding services.Comment: In ACL 2019 as a long paper. Code and data available at https://github.com/czhang99/Capsule-NL

arXiv.org e-Print Archive

Induction Networks for Few-Shot Text Classification

Author: Geng Ruiying
Jian Ping
Li Binhua
Li Yongbin
Sun Jian
Zhu Xiaodan
Publication venue
Publication date: 29/09/2019
Field of study

Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies have used meta-learning to simulate the few-shot task, in which new queries are compared to a small support set at the sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such a generalized class-wise representation, by innovatively leveraging the dynamic routing algorithm in meta-learning. In this way, we find the model is able to induce and generalize better. We evaluate the proposed model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that on both datasets, the proposed model significantly outperforms the existing state-of-the-art approaches, proving the effectiveness of class-wise generalization in few-shot text classification.Comment: 7 pages, 3 figure

arXiv.org e-Print Archive

Learning Class-Transductive Intent Representations for Zero-shot Intent Detection

Author: Fu Peng
Li Jiangnan
Lin Zheng
Liu Yuanxin
Si Qingyi
Wang Weiping
Publication venue
Publication date: 08/06/2021
Field of study

Zero-shot intent detection (ZSID) aims to deal with the continuously emerging intents without annotated training data. However, existing ZSID systems suffer from two limitations: 1) They are not good at modeling the relationship between seen and unseen intents. 2) They cannot effectively recognize unseen intents under the generalized intent detection (GZSID) setting. A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage. To address this problem, we propose a novel framework that utilizes unseen class labels to learn Class-Transductive Intent Representations (CTIR). Specifically, we allow the model to predict unseen intents during training, with the corresponding label names serving as input utterances. On this basis, we introduce a multi-task learning objective, which encourages the model to learn the distinctions among intents, and a similarity scorer, which estimates the connections among intents more accurately. CTIR is easy to implement and can be integrated with existing methods. Experiments on two real-world datasets show that CTIR brings considerable improvement to the baseline systems.Comment: IJCAI-202

arXiv.org e-Print Archive

Joint Training Capsule Network for Cold Start Recommendation

Author: Liang Tingting
Xia Congying
Yin Yuyu
Yu Philip S.
Publication venue
Publication date: 23/05/2020
Field of study

This paper proposes a novel neural network, joint training capsule network (JTCN), for the cold start recommendation task. We propose to mimic the high-level user preference other than the raw interaction history based on the side information for the fresh users. Specifically, an attentive capsule layer is proposed to aggregate high-level user preference from the low-level interaction history via a dynamic routing-by-agreement mechanism. Moreover, JTCN jointly trains the loss for mimicking the user preference and the softmax loss for the recommendation together in an end-to-end manner. Experiments on two publicly available datasets demonstrate the effectiveness of the proposed model. JTCN improves other state-of-the-art methods at least 7.07% for CiteULike and 16.85% for Amazon in terms of Recall@100 in cold start recommendation.Comment: Accepted by SIGIR'2

arXiv.org e-Print Archive

Towards Open Intent Discovery for Conversational Text

Author: Lipka Nedim
Maneriker Pranav
Parthasarathy Srinivasan
Vedula Nikhita
Publication venue
Publication date: 17/04/2019
Field of study

Detecting and identifying user intent from text, both written and spoken, plays an important role in modelling and understand dialogs. Existing research for intent discovery model it as a classification task with a predefined set of known categories. To generailze beyond these preexisting classes, we define a new task of \textit{open intent discovery}. We investigate how intent can be generalized to those not seen during training. To this end, we propose a two-stage approach to this task - predicting whether an utterance contains an intent, and then tagging the intent in the input utterance. Our model consists of a bidirectional LSTM with a CRF on top to capture contextual semantics, subject to some constraints. Self-attention is used to learn long distance dependencies. Further, we adapt an adversarial training approach to improve robustness and perforamce across domains. We also present a dataset of 25k real-life utterances that have been labelled via crowd sourcing. Our experiments across different domains and real-world datasets show the effectiveness of our approach, with less than 100 annotated examples needed per unique domain to recognize diverse intents. The approach outperforms state-of-the-art baselines by 5-15% F1 score points

arXiv.org e-Print Archive

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

Author: Che Wanxiang
Liu Ting
Qin Libo
Xie Tianbao
Publication venue
Publication date: 04/03/2021
Field of study

Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system. With the burst of deep neural networks and the evolution of pre-trained language models, the research of SLU has obtained significant breakthroughs. However, there remains a lack of a comprehensive survey summarizing existing approaches and recent trends, which motivated the work presented in this article. In this paper, we survey recent advances and new frontiers in SLU. Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pre-trained paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. We hope that this survey can shed a light on future research in SLU field.Comment: Survey for SLU Direction. Resources in \url{https://github.com/yizhen20133868/Awesome-SLU-Survey

arXiv.org e-Print Archive

Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection

Author: Hardalov Momchil
Koychev Ivan
Nakov Preslav
Publication venue
Publication date: 30/04/2020
Field of study

Detecting the user's intent and finding the corresponding slots among the utterance's words are important tasks in natural language understanding. Their interconnected nature makes their joint modeling a standard part of training such models. Moreover, data scarceness and specialized vocabularies pose additional challenges. Recently, the advances in pre-trained language models, namely contextualized models such as ELMo and BERT have revolutionized the field by tapping the potential of training very large models with just a few steps of fine-tuning on a task-specific dataset. Here, we leverage such model, namely BERT, and we design a novel architecture on top it. Moreover, we propose an intent pooling attention mechanism, and we reinforce the slot filling task by fusing intent distributions, word features, and token representations. The experimental results on standard datasets show that our model outperforms both the current non-BERT state of the art as well as some stronger BERT-based baselines

arXiv.org e-Print Archive

Robust Zero-Shot Cross-Domain Slot Filling with Example Values

Author: Fayazi Amir A
Gupta Raghav
Hakkani-Tur Dilek
Shah Darsh J
Publication venue
Publication date: 17/06/2019
Field of study

Task-oriented dialog systems increasingly rely on deep learning-based slot filling models, usually needing extensive labeled training data for target domains. Often, however, little to no target domain training data may be available, or the training and target domain schemas may be misaligned, as is common for web forms on similar websites. Prior zero-shot slot filling models use slot descriptions to learn concepts, but are not robust to misaligned schemas. We propose utilizing both the slot description and a small number of examples of slot values, which may be easily available, to learn semantic representations of slots which are transferable across domains and robust to misaligned schemas. Our approach outperforms state-of-the-art models on two multi-domain datasets, especially in the low-data setting.Comment: To appear in ACL 201

arXiv.org e-Print Archive

Detecting Fake News with Capsule Neural Networks

Author: Goldani Mohammad Hadi
Momtazi Saeedeh
Safabakhsh Reza
Publication venue
Publication date: 03/02/2020
Field of study

Fake news is dramatically increased in social media in recent years. This has prompted the need for effective fake news detection algorithms. Capsule neural networks have been successful in computer vision and are receiving attention for use in Natural Language Processing (NLP). This paper aims to use capsule neural networks in the fake news detection task. We use different embedding models for news items of different lengths. Static word embedding is used for short news items, whereas non-static word embeddings that allow incremental up-training and updating in the training phase are used for medium length or large news statements. Moreover, we apply different levels of n-grams for feature extraction. Our proposed architectures are evaluated on two recent well-known datasets in the field, namely ISOT and LIAR. The results show encouraging performance, outperforming the state-of-the-art methods by 7.8% on ISOT and 3.1% on the validation set, and 1% on the test set of the LIAR dataset.Comment: 25 pages, 4 figure

arXiv.org e-Print Archive