7,114 research outputs found

    Sequential Dialogue Context Modeling for Spoken Language Understanding

    Full text link
    Spoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a recurrent neural network (RNN) based language understanding system. We propose the Sequential Dialogue Encoder Network, that allows encoding context from the dialogue history in chronological order. We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history. Experiments with a multi-domain dialogue dataset demonstrate that the proposed architecture results in reduced semantic frame error rates.Comment: 8 + 2 pages, Updated 10/17: Updated typos in abstract, Updated 07/07: Updated Title, abstract and few minor change

    Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation

    Full text link
    This paper discusses our approaches for task-oriented conversational modelling using subjective knowledge, with a particular emphasis on response generation. Our methodology was shaped by an extensive data analysis that evaluated key factors such as response length, sentiment, and dialogue acts present in the provided dataset. We used few-shot learning to augment the data with newly generated subjective knowledge items and present three approaches for DSTC11: (1) task-specific model exploration, (2) incorporation of the most frequent question into all generated responses, and (3) a waterfall prompting technique using a combination of both GPT-3 and ChatGPT.Comment: DSTC1

    Breaching bodily boundaries: posthuman (dis)embodiment and ecstatic speech in lip-synch performances by boychild

    Get PDF
    Employing a sci-fi inspired aesthetic, queer, black, trans artist, boychild presents audiences with a future vision of human embodiment. Strobe lighting makes her appear fragmented or as if she were a hologram. An electronic light flickers behind her teeth. Her eyes are obscured by whited-out contact lenses. boychildโ€™s is a body interfaced with technology. She is imaged as non-human, cyborgian. Whilst boychild considers her onstage persona to be female, her body reads ambiguously. Transgressing demarcations between the supposedly polarised categories of organic/machine, male/female, the queer form of embodiment she presents is posthuman. Implementing the theoretical principles of Rosi Braidottiโ€™s anti-humanist concept of the posthuman and Donna Harawayโ€™s cyborg politics, I argue that boychildโ€™s engagement with the posthuman does not end with aesthetics, rather it extends to the plotting of a posthuman politics, posing a radical challenge to heteronormative body politics. Theorising boychildโ€™s lip-synch performances, I argue for her style of performance as a technologised form of ventriloquism, as she โ€˜speaksโ€™ with the voice of another or the voice of another speaks through her. Using Mladen Dolarโ€™s and Slavoj ลฝiลพekโ€™s psychoanalytical philosophies in conjunction with Steven Connorโ€™s literature on ventriloquism, I unpick the intricacies of presence and power inherent to her โ€˜voiceโ€™ and indicate its broader political implications

    Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

    Full text link
    With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction.Comment: Accepted at NLPCC 2023 (Oral Presentation

    ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ๋ชจ๋ธ์„ ์ด์šฉํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์ด์ƒ๊ตฌ.Recent advances in generation capability of deep learning models have spurred interest in utilizing deep generative models for unsupervised generative data augmentation (GDA). Generative data augmentation aims to improve the performance of a downstream machine learning model by augmenting the original dataset with samples generated from a deep latent variable model. This data augmentation approach is attractive to the natural language processing community, because (1) there is a shortage of text augmentation techniques that require little supervision and (2) resource scarcity being prevalent. In this dissertation, we explore the feasibility of exploiting deep latent variable models for data augmentation on three NLP tasks: sentence classification, spoken language understanding (SLU) and dialogue state tracking (DST), represent NLP tasks of various complexities and properties -- SLU requires multi-task learning of text classification and sequence tagging, while DST requires the understanding of hierarchical and recurrent data structures. For each of the three tasks, we propose a task-specific latent variable model based on conditional, hierarchical and sequential variational autoencoders (VAE) for multi-modal joint modeling of linguistic features and the relevant annotations. We conduct extensive experiments to statistically justify our hypothesis that deep generative data augmentation is beneficial for all subject tasks. Our experiments show that deep generative data augmentation is effective for the select tasks, supporting the idea that the technique can potentially be utilized for other range of NLP tasks. Ablation and qualitative studies reveal deeper insight into the underlying mechanisms of generative data augmentation. As a secondary contribution, we also shed light onto the recurring posterior collapse phenomenon in autoregressive VAEs and, subsequently, propose novel techniques to reduce the model risk, which is crucial for proper training of complex VAE models, enabling them to synthesize better samples for data augmentation. In summary, this work intends to demonstrate and analyze the effectiveness of unsupervised generative data augmentation in NLP. Ultimately, our approach enables standardized adoption of generative data augmentation, which can be applied orthogonally to existing regularization techniques.์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ๊ธ‰๊ฒฉํ•œ ๋ฐœ์ „์œผ๋กœ ์ด๋ฅผ ์ด์šฉํ•œ ์ƒ์„ฑ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•(generative data augmentation, GDA)์˜ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ๊ธฐ๋Œ€๊ฐ€ ์ปค์ง€๊ณ  ์žˆ๋‹ค. ์ƒ์„ฑ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์€ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ž ์žฌ๋ณ€์ˆ˜ ๋ชจ๋ธ์—์„œ ์ƒ์„ฑ ๋œ ์ƒ˜ํ”Œ์„ ์›๋ณธ ๋ฐ์ดํ„ฐ์…‹์— ์ถ”๊ฐ€ํ•˜์—ฌ ์—ฐ๊ด€๋œ ํƒœ์Šคํฌ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ธฐ์ˆ ์„ ์˜๋ฏธํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ƒ์„ฑ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์€ ๋ฐ์ดํ„ฐ ๊ณต๊ฐ„์—์„œ ์ด๋ค„์ง€๋Š” ์ •๊ทœํ™” ๊ธฐ์ˆ ์˜ ํ•œ ํ˜•ํƒœ๋กœ ๊ฐ„์ฃผ๋  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ƒˆ๋กœ์šด ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์€ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ถ„์•ผ์—์„œ ๋”์šฑ ์ค‘์š”ํ•˜๊ฒŒ ๋ถ€๊ฐ๋˜๋Š” ์ด์œ ๋Š” (1) ๋ฒ”์šฉ ๊ฐ€๋Šฅํ•œ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ์ˆ ์˜ ๋ถ€์žฌ์™€ (2) ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ํฌ์†Œ์„ฑ์„ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋Š” ๋Œ€์•ˆ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ฌธ์ œ์˜ ๋ณต์žก๋„์™€ ํŠน์ง•์„ ๊ณจ๊ณ ๋ฃจ ์ฑ„์ง‘ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ…์ŠคํŠธ ๋ถ„๋ฅ˜(text classification), ์ˆœ์ฐจ์  ๋ ˆ์ด๋ธ”๋ง๊ณผ ๋ฉ€ํ‹ฐํƒœ์Šคํ‚น ๊ธฐ์ˆ ์ด ํ•„์š”ํ•œ ๋ฐœํ™” ์ดํ•ด(spoken language understanding, SLU), ๊ณ„์ธต์ ์ด๋ฉฐ ์žฌ๊ท€์ ์ธ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ๊ณ ๋ ค๊ฐ€ ํ•„์š”ํ•œ ๋Œ€ํ™” ์ƒํƒœ ์ถ”์ (dialogue state tracking, DST) ๋“ฑ ์„ธ ๊ฐ€์ง€ ๋ฌธ์ œ์—์„œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์˜ ํƒ€๋‹น์„ฑ์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์กฐ๊ฑด๋ถ€, ๊ณ„์ธต์  ๋ฐ ์ˆœ์ฐจ์  variational autoencoder (VAE)์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ฐ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ฌธ์ œ์— ํŠนํ™”๋œ ํ…์ŠคํŠธ ๋ฐ ์—ฐ๊ด€ ๋ถ€์ฐฉ ์ •๋ณด๋ฅผ ๋™์‹œ์— ์ƒ์„ฑํ•˜๋Š” ํŠน์ˆ˜ ๋”ฅ๋Ÿฌ๋‹ ์ƒ์„ฑ ๋ชจ๋ธ๋“ค์„ ์ œ์‹œํ•˜๊ณ , ๋‹ค์–‘ํ•œ ํ•˜๋ฅ˜ ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค๋ฃจ๋Š” ๋“ฑ ํญ ๋„“์€ ์‹คํ—˜์„ ํ†ตํ•ด ๋”ฅ ์ƒ์„ฑ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ํ†ต๊ณ„์ ์œผ๋กœ ์ž…์ฆํ•˜์˜€๋‹ค. ๋ถ€์ˆ˜์  ์—ฐ๊ตฌ์—์„œ๋Š” ์ž๊ธฐํšŒ๊ท€์ (autoregressive) VAE์—์„œ ๋นˆ๋ฒˆํžˆ ๋ฐœ์ƒํ•˜๋Š” posterior collapse ๋ฌธ์ œ์— ๋Œ€ํ•ด ํƒ๊ตฌํ•˜๊ณ , ํ•ด๋‹น ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์‹ ๊ทœ ๋ฐฉ์•ˆ๋„ ์ œ์•ˆํ•œ๋‹ค. ํ•ด๋‹น ๋ฐฉ๋ฒ•์„ ์ƒ์„ฑ์  ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์— ํ•„์š”ํ•œ ๋ณต์žกํ•œ VAE ๋ชจ๋ธ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ƒ์„ฑ ์งˆ์ด ํ–ฅ์ƒ๋˜์–ด ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ํšจ๊ณผ์—๋„ ๊ธ์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Œ์„ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์„ ํ†ตํ•ด ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ถ„์•ผ์—์„œ ๊ธฐ์กด ์ •๊ทœํ™” ๊ธฐ๋ฒ•๊ณผ ๋ณ‘ํ–‰ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋น„์ง€๋„ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์˜ ํ‘œ์ค€ํ™”๋ฅผ ๊ธฐ๋Œ€ํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค.1 Introduction 1 1.1 Motivation 1 1.2 Dissertation Overview 6 2 Background and Related Work 8 2.1 Deep Latent Variable Models 8 2.1.1 Variational Autoencoder (VAE) 10 2.1.2 Deep Generative Models and Text Generation 12 2.2 Data Augmentation 12 2.2.1 General Description 13 2.2.2 Categorization of Data Augmentation 14 2.2.3 Theoretical Explanations 21 2.3 Summary 24 3 Basic Task: Text Classi cation 25 3.1 Introduction 25 3.2 Our Approach 28 3.2.1 Proposed Models 28 3.2.2 Training with I-VAE 29 3.3 Experiments 31 3.3.1 Datasets 32 3.3.2 Experimental Settings 33 3.3.3 Implementation Details 34 3.3.4 Data Augmentation Results 36 3.3.5 Ablation Studies 39 3.3.6 Qualitative Analysis 40 3.4 Summary 45 4 Multi-task Learning: Spoken Language Understanding 46 4.1 Introduction 46 4.2 Related Work 48 4.3 Model Description 48 4.3.1 Framework Formulation 48 4.3.2 Joint Generative Model 49 4.4 Experiments 56 4.4.1 Datasets 56 4.4.2 Experimental Settings 57 4.4.3 Generative Data Augmentation Results 61 4.4.4 Comparison to Other State-of-the-art Results 63 4.4.5 Ablation Studies 63 4.5 Summary 67 5 Complex Data: Dialogue State Tracking 68 5.1 Introduction 68 5.2 Background and Related Work 70 5.2.1 Task-oriented Dialogue 70 5.2.2 Dialogue State Tracking 72 5.2.3 Conversation Modeling 72 5.3 Variational Hierarchical Dialogue Autoencoder (VHDA) 73 5.3.1 Notations 73 5.3.2 Variational Hierarchical Conversational RNN 74 5.3.3 Proposed Model 75 5.3.4 Posterior Collapse 82 5.4 Experimental Results 84 5.4.1 Experimental Settings 84 5.4.2 Data Augmentation Results 90 5.4.3 Intrinsic Evaluation - Language Evaluation 94 5.4.4 Qualitative Results 95 5.5 Summary 101 6 Conclusion 103 6.1 Summary 103 6.2 Limitations 104 6.3 Future Work 105Docto
    • โ€ฆ
    corecore