1,669 research outputs found

    Deep Open Intent Classification with Adaptive Decision Boundary

    Full text link
    Open intent classification is a challenging task in dialogue systems. On the one hand, it should ensure the quality of known intent identification. On the other hand, it needs to detect the open (unknown) intent without prior knowledge. Current models are limited in finding the appropriate decision boundary to balance the performances of both known intents and the open intent. In this paper, we propose a post-processing method to learn the adaptive decision boundary (ADB) for open intent classification. We first utilize the labeled known intent samples to pre-train the model. Then, we automatically learn the adaptive spherical decision boundary for each known class with the aid of well-trained features. Specifically, we propose a new loss function to balance both the empirical risk and the open space risk. Our method does not need open intent samples and is free from modifying the model architecture. Moreover, our approach is surprisingly insensitive with less labeled data and fewer known intents. Extensive experiments on three benchmark datasets show that our method yields significant improvements compared with the state-of-the-art methods. The codes are released at https://github.com/thuiar/Adaptive-Decision-Boundary.Comment: Accepted by AAAI 2021 (Main Track, Long Paper

    A Hybrid Architecture for Out of Domain Intent Detection and Intent Discovery

    Full text link
    Intent Detection is one of the tasks of the Natural Language Understanding (NLU) unit in task-oriented dialogue systems. Out of Scope (OOS) and Out of Domain (OOD) inputs may run these systems into a problem. On the other side, a labeled dataset is needed to train a model for Intent Detection in task-oriented dialogue systems. The creation of a labeled dataset is time-consuming and needs human resources. The purpose of this article is to address mentioned problems. The task of identifying OOD/OOS inputs is named OOD/OOS Intent Detection. Also, discovering new intents and pseudo-labeling of OOD inputs is well known by Intent Discovery. In OOD intent detection part, we make use of a Variational Autoencoder to distinguish between known and unknown intents independent of input data distribution. After that, an unsupervised clustering method is used to discover different unknown intents underlying OOD/OOS inputs. We also apply a non-linear dimensionality reduction on OOD/OOS representations to make distances between representations more meaning full for clustering. Our results show that the proposed model for both OOD/OOS Intent Detection and Intent Discovery achieves great results and passes baselines in English and Persian languages

    Open World Classification with Adaptive Negative Samples

    Full text link
    Open world classification is a task in natural language processing with key practical relevance and impact. Since the open or {\em unknown} category data only manifests in the inference phase, finding a model with a suitable decision boundary accommodating for the identification of known classes and discrimination of the open category is challenging. The performance of existing models is limited by the lack of effective open category data during the training stage or the lack of a good mechanism to learn appropriate decision boundaries. We propose an approach based on \underline{a}daptive \underline{n}egative \underline{s}amples (ANS) designed to generate effective synthetic open category samples in the training stage and without requiring any prior knowledge or external datasets. Empirically, we find a significant advantage in using auxiliary one-versus-rest binary classifiers, which effectively utilize the generated negative samples and avoid the complex threshold-seeking stage in previous works. Extensive experiments on three benchmark datasets show that ANS achieves significant improvements over state-of-the-art methods.Comment: Accepted by EMNLP 2021 (Main Track, Long Paper

    λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ„ μ΄μš©ν•œ μžμ—°μ–΄μ²˜λ¦¬ 데이터 증강 기법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀,2020. 2. 이상ꡬ.Recent advances in generation capability of deep learning models have spurred interest in utilizing deep generative models for unsupervised generative data augmentation (GDA). Generative data augmentation aims to improve the performance of a downstream machine learning model by augmenting the original dataset with samples generated from a deep latent variable model. This data augmentation approach is attractive to the natural language processing community, because (1) there is a shortage of text augmentation techniques that require little supervision and (2) resource scarcity being prevalent. In this dissertation, we explore the feasibility of exploiting deep latent variable models for data augmentation on three NLP tasks: sentence classification, spoken language understanding (SLU) and dialogue state tracking (DST), represent NLP tasks of various complexities and properties -- SLU requires multi-task learning of text classification and sequence tagging, while DST requires the understanding of hierarchical and recurrent data structures. For each of the three tasks, we propose a task-specific latent variable model based on conditional, hierarchical and sequential variational autoencoders (VAE) for multi-modal joint modeling of linguistic features and the relevant annotations. We conduct extensive experiments to statistically justify our hypothesis that deep generative data augmentation is beneficial for all subject tasks. Our experiments show that deep generative data augmentation is effective for the select tasks, supporting the idea that the technique can potentially be utilized for other range of NLP tasks. Ablation and qualitative studies reveal deeper insight into the underlying mechanisms of generative data augmentation. As a secondary contribution, we also shed light onto the recurring posterior collapse phenomenon in autoregressive VAEs and, subsequently, propose novel techniques to reduce the model risk, which is crucial for proper training of complex VAE models, enabling them to synthesize better samples for data augmentation. In summary, this work intends to demonstrate and analyze the effectiveness of unsupervised generative data augmentation in NLP. Ultimately, our approach enables standardized adoption of generative data augmentation, which can be applied orthogonally to existing regularization techniques.졜근 λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ˜ κΈ‰κ²©ν•œ λ°œμ „μœΌλ‘œ 이λ₯Ό μ΄μš©ν•œ 생성 기반 데이터 증강 기법(generative data augmentation, GDA)의 μ‹€ν˜„ κ°€λŠ₯성에 λŒ€ν•œ κΈ°λŒ€κ°€ 컀지고 μžˆλ‹€. 생성 기반 데이터 증강 기법은 λ”₯λŸ¬λ‹ 기반 μž μž¬λ³€μˆ˜ λͺ¨λΈμ—μ„œ 생성 된 μƒ˜ν”Œμ„ 원본 데이터셋에 μΆ”κ°€ν•˜μ—¬ μ—°κ΄€λœ νƒœμŠ€ν¬μ˜ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€λŠ” κΈ°μˆ μ„ μ˜λ―Έν•œλ‹€. λ”°λΌμ„œ 생성 기반 데이터 증강 기법은 데이터 κ³΅κ°„μ—μ„œ μ΄λ€„μ§€λŠ” μ •κ·œν™” 기술의 ν•œ ν˜•νƒœλ‘œ 간주될 수 μžˆλ‹€. μ΄λŸ¬ν•œ λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ˜ μƒˆλ‘œμš΄ ν™œμš© κ°€λŠ₯성은 μžμ—°μ–΄μ²˜λ¦¬ λΆ„μ•Όμ—μ„œ λ”μš± μ€‘μš”ν•˜κ²Œ λΆ€κ°λ˜λŠ” μ΄μœ λŠ” (1) λ²”μš© κ°€λŠ₯ν•œ ν…μŠ€νŠΈ 데이터 증강 기술의 λΆ€μž¬μ™€ (2) ν…μŠ€νŠΈ λ°μ΄ν„°μ˜ ν¬μ†Œμ„±μ„ 극볡할 수 μžˆλŠ” λŒ€μ•ˆμ΄ ν•„μš”ν•˜κΈ° λ•Œλ¬Έμ΄λ‹€. 문제의 λ³΅μž‘λ„μ™€ νŠΉμ§•μ„ 골고루 μ±„μ§‘ν•˜κΈ° μœ„ν•΄ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν…μŠ€νŠΈ λΆ„λ₯˜(text classification), 순차적 λ ˆμ΄λΈ”λ§κ³Ό λ©€ν‹°νƒœμŠ€ν‚Ή 기술이 ν•„μš”ν•œ λ°œν™” 이해(spoken language understanding, SLU), 계측적이며 μž¬κ·€μ μΈ 데이터 ꡬ쑰에 λŒ€ν•œ κ³ λ €κ°€ ν•„μš”ν•œ λŒ€ν™” μƒνƒœ 좔적(dialogue state tracking, DST) λ“± μ„Έ 가지 λ¬Έμ œμ—μ„œ λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ„ ν™œμš©ν•œ 데이터 증강 κΈ°λ²•μ˜ 타당성에 λŒ€ν•΄ 닀룬닀. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 쑰건뢀, 계측적 및 순차적 variational autoencoder (VAE)에 κΈ°λ°˜ν•˜μ—¬ 각 μžμ—°μ–΄μ²˜λ¦¬ λ¬Έμ œμ— νŠΉν™”λœ ν…μŠ€νŠΈ 및 μ—°κ΄€ λΆ€μ°© 정보λ₯Ό λ™μ‹œμ— μƒμ„±ν•˜λŠ” 특수 λ”₯λŸ¬λ‹ 생성 λͺ¨λΈλ“€μ„ μ œμ‹œν•˜κ³ , λ‹€μ–‘ν•œ ν•˜λ₯˜ λͺ¨λΈκ³Ό 데이터셋을 λ‹€λ£¨λŠ” λ“± 폭 넓은 μ‹€ν—˜μ„ 톡해 λ”₯ 생성 λͺ¨λΈ 기반 데이터 증강 κΈ°λ²•μ˜ 효과λ₯Ό ν†΅κ³„μ μœΌλ‘œ μž…μ¦ν•˜μ˜€λ‹€. λΆ€μˆ˜μ  μ—°κ΅¬μ—μ„œλŠ” μžκΈ°νšŒκ·€μ (autoregressive) VAEμ—μ„œ 빈번히 λ°œμƒν•˜λŠ” posterior collapse λ¬Έμ œμ— λŒ€ν•΄ νƒκ΅¬ν•˜κ³ , ν•΄λ‹Ή 문제λ₯Ό μ™„ν™”ν•  수 μžˆλŠ” μ‹ κ·œ λ°©μ•ˆλ„ μ œμ•ˆν•œλ‹€. ν•΄λ‹Ή 방법을 생성적 데이터 증강에 ν•„μš”ν•œ λ³΅μž‘ν•œ VAE λͺ¨λΈμ— μ μš©ν•˜μ˜€μ„ λ•Œ, 생성 λͺ¨λΈμ˜ 생성 질이 ν–₯μƒλ˜μ–΄ 데이터 증강 νš¨κ³Όμ—λ„ 긍정적인 영ν–₯을 λ―ΈμΉ  수 μžˆμŒμ„ κ²€μ¦ν•˜μ˜€λ‹€. λ³Έ 논문을 톡해 μžμ—°μ–΄μ²˜λ¦¬ λΆ„μ•Όμ—μ„œ κΈ°μ‘΄ μ •κ·œν™” 기법과 병행 적용 κ°€λŠ₯ν•œ 비지도 ν˜•νƒœμ˜ 데이터 증강 κΈ°λ²•μ˜ ν‘œμ€€ν™”λ₯Ό κΈ°λŒ€ν•΄ λ³Ό 수 μžˆλ‹€.1 Introduction 1 1.1 Motivation 1 1.2 Dissertation Overview 6 2 Background and Related Work 8 2.1 Deep Latent Variable Models 8 2.1.1 Variational Autoencoder (VAE) 10 2.1.2 Deep Generative Models and Text Generation 12 2.2 Data Augmentation 12 2.2.1 General Description 13 2.2.2 Categorization of Data Augmentation 14 2.2.3 Theoretical Explanations 21 2.3 Summary 24 3 Basic Task: Text Classi cation 25 3.1 Introduction 25 3.2 Our Approach 28 3.2.1 Proposed Models 28 3.2.2 Training with I-VAE 29 3.3 Experiments 31 3.3.1 Datasets 32 3.3.2 Experimental Settings 33 3.3.3 Implementation Details 34 3.3.4 Data Augmentation Results 36 3.3.5 Ablation Studies 39 3.3.6 Qualitative Analysis 40 3.4 Summary 45 4 Multi-task Learning: Spoken Language Understanding 46 4.1 Introduction 46 4.2 Related Work 48 4.3 Model Description 48 4.3.1 Framework Formulation 48 4.3.2 Joint Generative Model 49 4.4 Experiments 56 4.4.1 Datasets 56 4.4.2 Experimental Settings 57 4.4.3 Generative Data Augmentation Results 61 4.4.4 Comparison to Other State-of-the-art Results 63 4.4.5 Ablation Studies 63 4.5 Summary 67 5 Complex Data: Dialogue State Tracking 68 5.1 Introduction 68 5.2 Background and Related Work 70 5.2.1 Task-oriented Dialogue 70 5.2.2 Dialogue State Tracking 72 5.2.3 Conversation Modeling 72 5.3 Variational Hierarchical Dialogue Autoencoder (VHDA) 73 5.3.1 Notations 73 5.3.2 Variational Hierarchical Conversational RNN 74 5.3.3 Proposed Model 75 5.3.4 Posterior Collapse 82 5.4 Experimental Results 84 5.4.1 Experimental Settings 84 5.4.2 Data Augmentation Results 90 5.4.3 Intrinsic Evaluation - Language Evaluation 94 5.4.4 Qualitative Results 95 5.5 Summary 101 6 Conclusion 103 6.1 Summary 103 6.2 Limitations 104 6.3 Future Work 105Docto

    Signal Processing and Machine Learning Techniques Towards Various Real-World Applications

    Get PDF
    abstract: Machine learning (ML) has played an important role in several modern technological innovations and has become an important tool for researchers in various fields of interest. Besides engineering, ML techniques have started to spread across various departments of study, like health-care, medicine, diagnostics, social science, finance, economics etc. These techniques require data to train the algorithms and model a complex system and make predictions based on that model. Due to development of sophisticated sensors it has become easier to collect large volumes of data which is used to make necessary hypotheses using ML. The promising results obtained using ML have opened up new opportunities of research across various departments and this dissertation is a manifestation of it. Here, some unique studies have been presented, from which valuable inference have been drawn for a real-world complex system. Each study has its own unique sets of motivation and relevance to the real world. An ensemble of signal processing (SP) and ML techniques have been explored in each study. This dissertation provides the detailed systematic approach and discusses the results achieved in each study. Valuable inferences drawn from each study play a vital role in areas of science and technology, and it is worth further investigation. This dissertation also provides a set of useful SP and ML tools for researchers in various fields of interest.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Modeling users interacting with smart devices

    Get PDF
    • …
    corecore