764 research outputs found

    An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking

    Full text link
    We highlight a practical yet rarely discussed problem in dialogue state tracking (DST), namely handling unknown slot values. Previous approaches generally assume predefined candidate lists and thus are not designed to output unknown values, especially when the spoken language understanding (SLU) module is absent as in many end-to-end (E2E) systems. We describe in this paper an E2E architecture based on the pointer network (PtrNet) that can effectively extract unknown slot values while still obtains state-of-the-art accuracy on the standard DSTC2 benchmark. We also provide extensive empirical evidence to show that tracking unknown values can be challenging and our approach can bring significant improvement with the help of an effective feature dropout technique.Comment: Accepted by ACL 201

    BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer

    Full text link
    An important yet rarely tackled problem in dialogue state tracking (DST) is scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot values. We focus on a specific condition, where the ontology is unknown to the state tracker, but the target slot value (except for none and dontcare), possibly unseen during training, can be found as word segment in the dialogue context. Prior approaches often rely on candidate generation from n-gram enumeration or slot tagger outputs, which can be inefficient or suffer from error propagation. We propose BERT-DST, an end-to-end dialogue state tracker which directly extracts slot values from the dialogue context. We use BERT as dialogue context encoder whose contextualized language representations are suitable for scalable DST to identify slot values from their semantic context. Furthermore, we employ encoder parameter sharing across all slots with two advantages: (1) Number of parameters does not grow linearly with the ontology. (2) Language representation knowledge can be transferred among slots. Empirical evaluation shows BERT-DST with cross-slot parameter sharing outperforms prior work on the benchmark scalable DST datasets Sim-M and Sim-R, and achieves competitive performance on the standard DSTC2 and WOZ 2.0 datasets.Comment: Published in Interspeech 201

    Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

    Full text link
    This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices. The embedded inference is fast and accurate while enforcing privacy by design, as no personal user data is ever collected. Focusing on Automatic Speech Recognition and Natural Language Understanding, we detail our approach to training high-performance Machine Learning models that are small enough to run in real-time on small devices. Additionally, we describe a data generation procedure that provides sufficient, high-quality training data without compromising user privacy.Comment: 29 pages, 9 figures, 17 table

    STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++

    Full text link
    Slot-filling, Translation, Intent classification, and Language identification, or STIL, is a newly-proposed task for multilingual Natural Language Understanding (NLU). By performing simultaneous slot filling and translation into a single output language (English in this case), some portion of downstream system components can be monolingual, reducing development and maintenance cost. Results are given using the multilingual BART model (Liu et al., 2020) fine-tuned on 7 languages using the MultiATIS++ dataset. When no translation is performed, mBART's performance is comparable to the current state of the art system (Cross-Lingual BERT by Xu et al. (2020)) for the languages tested, with better average intent classification accuracy (96.07% versus 95.50%) but worse average slot F1 (89.87% versus 90.81%). When simultaneous translation is performed, average intent classification accuracy degrades by only 1.7% relative and average slot F1 degrades by only 1.2% relative.Comment: 4 pages; To be published at AACL 2020; For code, see: https://github.com/jgmfitz/stil-mbart-multiatispp-aacl202

    STN4DST: A Scalable Dialogue State Tracking based on Slot Tagging Navigation

    Full text link
    Scalability for handling unknown slot values is a important problem in dialogue state tracking (DST). As far as we know, previous scalable DST approaches generally rely on either the candidate generation from slot tagging output or the span extraction in dialogue context. However, the candidate generation based DST often suffers from error propagation due to its pipelined two-stage process; meanwhile span extraction based DST has the risk of generating invalid spans in the lack of semantic constraints between start and end position pointers. To tackle the above drawbacks, in this paper, we propose a novel scalable dialogue state tracking method based on slot tagging navigation, which implements an end-to-end single-step pointer to locate and extract slot value quickly and accurately by the joint learning of slot tagging and slot value position prediction in the dialogue context, especially for unknown slot values. Extensive experiments over several benchmark datasets show that the proposed model performs better than state-of-the-art baselines greatly

    Copy-Enhanced Heterogeneous Information Learning for Dialogue State Tracking

    Full text link
    Dialogue state tracking (DST) is an essential component in task-oriented dialogue systems, which estimates user goals at every dialogue turn. However, most previous approaches usually suffer from the following problems. Many discriminative models, especially end-to-end (E2E) models, are difficult to extract unknown values that are not in the candidate ontology; previous generative models, which can extract unknown values from utterances, degrade the performance due to ignoring the semantic information of pre-defined ontology. Besides, previous generative models usually need a hand-crafted list to normalize the generated values. How to integrate the semantic information of pre-defined ontology and dialogue text (heterogeneous texts) to generate unknown values and improve performance becomes a severe challenge. In this paper, we propose a Copy-Enhanced Heterogeneous Information Learning model with multiple encoder-decoder for DST (CEDST), which can effectively generate all possible values including unknown values by copying values from heterogeneous texts. Meanwhile, CEDST can effectively decompose the large state space into several small state spaces through multi-encoder, and employ multi-decoder to make full use of the reduced spaces to generate values. Multi-encoder-decoder architecture can significantly improve performance. Experiments show that CEDST can achieve state-of-the-art results on two datasets and our constructed datasets with many unknown values.Comment: 12 pages, 4 figure

    CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots

    Full text link
    Natural Language Understanding (NLU) is a core component of dialog systems. It typically involves two tasks - intent classification (IC) and slot labeling (SL), which are then followed by a dialogue management (DM) component. Such NLU systems cater to utterances in isolation, thus pushing the problem of context management to DM. However, contextual information is critical to the correct prediction of intents and slots in a conversation. Prior work on contextual NLU has been limited in terms of the types of contextual signals used and the understanding of their impact on the model. In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance. CASA-NLU outperforms a recurrent contextual NLU baseline on two conversational datasets, yielding a gain of up to 7% on the IC task for one of the datasets. Moreover, a non-contextual variant of CASA-NLU achieves state-of-the-art performance for IC task on standard public datasets - Snips and ATIS.Comment: To appear at EMNLP 201

    Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding

    Full text link
    Traditional slot filling in natural language understanding (NLU) predicts a one-hot vector for each word. This form of label representation lacks semantic correlation modelling, which leads to severe data sparsity problem, especially when adapting an NLU model to a new domain. To address this issue, a novel label embedding based slot filling framework is proposed in this paper. Here, distributed label embedding is constructed for each slot using prior knowledge. Three encoding methods are investigated to incorporate different kinds of prior knowledge about slots: atomic concepts, slot descriptions, and slot exemplars. The proposed label embeddings tend to share text patterns and reuses data with different slot labels. This makes it useful for adaptive NLU with limited data. Also, since label embedding is independent of NLU model, it is compatible with almost all deep learning based slot filling models. The proposed approaches are evaluated on three datasets. Experiments on single domain and domain adaptation tasks show that label embedding achieves significant performance improvement over traditional one-hot label representation as well as advanced zero-shot approaches.Comment: 11 pages, 6 figures; Accepted for IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSIN

    Measuring and Reducing Gendered Correlations in Pre-trained Models

    Full text link
    Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for models with similar accuracy to encode correlations at very different rates. We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have. With these results, we make recommendations for training robust models: (1) carefully evaluate unintended correlations, (2) be mindful of seemingly innocuous configuration differences, and (3) focus on general mitigations

    Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN

    Full text link
    Video reviews are the natural evolution of written product reviews. In this paper we target this phenomenon and introduce the first dataset created from closed captions of YouTube product review videos as well as a new attention-RNN model for aspect extraction and joint aspect extraction and sentiment classification. Our model provides state-of-the-art performance on aspect extraction without requiring the usage of hand-crafted features on the SemEval ABSA corpus, while it outperforms the baseline on the joint task. In our dataset, the attention-RNN model outperforms the baseline for both tasks, but we observe important performance drops for all models in comparison to SemEval. These results, as well as further experiments on domain adaptation for aspect extraction, suggest that differences between speech and written text, which have been discussed extensively in the literature, also extend to the domain of product reviews, where they are relevant for fine-grained opinion mining.Comment: 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA
    corecore