908 research outputs found

    State-of-the-art generalisation research in NLP: a taxonomy and review

    Get PDF
    The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what `good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the ground-work to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to up-date as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP.Comment: 35 pages of content + 53 pages of reference

    Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges

    Full text link
    Machine Learning algorithms have had a profound impact on the field of computer science over the past few decades. These algorithms performance is greatly influenced by the representations that are derived from the data in the learning process. The representations learned in a successful learning process should be concise, discrete, meaningful, and able to be applied across a variety of tasks. A recent effort has been directed toward developing Deep Learning models, which have proven to be particularly effective at capturing high-dimensional, non-linear, and multi-modal characteristics. In this work, we discuss the principles and developments that have been made in the process of learning representations, and converting them into desirable applications. In addition, for each framework or model, the key issues and open challenges, as well as the advantages, are examined

    λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ„ μ΄μš©ν•œ μžμ—°μ–΄μ²˜λ¦¬ 데이터 증강 기법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀,2020. 2. 이상ꡬ.Recent advances in generation capability of deep learning models have spurred interest in utilizing deep generative models for unsupervised generative data augmentation (GDA). Generative data augmentation aims to improve the performance of a downstream machine learning model by augmenting the original dataset with samples generated from a deep latent variable model. This data augmentation approach is attractive to the natural language processing community, because (1) there is a shortage of text augmentation techniques that require little supervision and (2) resource scarcity being prevalent. In this dissertation, we explore the feasibility of exploiting deep latent variable models for data augmentation on three NLP tasks: sentence classification, spoken language understanding (SLU) and dialogue state tracking (DST), represent NLP tasks of various complexities and properties -- SLU requires multi-task learning of text classification and sequence tagging, while DST requires the understanding of hierarchical and recurrent data structures. For each of the three tasks, we propose a task-specific latent variable model based on conditional, hierarchical and sequential variational autoencoders (VAE) for multi-modal joint modeling of linguistic features and the relevant annotations. We conduct extensive experiments to statistically justify our hypothesis that deep generative data augmentation is beneficial for all subject tasks. Our experiments show that deep generative data augmentation is effective for the select tasks, supporting the idea that the technique can potentially be utilized for other range of NLP tasks. Ablation and qualitative studies reveal deeper insight into the underlying mechanisms of generative data augmentation. As a secondary contribution, we also shed light onto the recurring posterior collapse phenomenon in autoregressive VAEs and, subsequently, propose novel techniques to reduce the model risk, which is crucial for proper training of complex VAE models, enabling them to synthesize better samples for data augmentation. In summary, this work intends to demonstrate and analyze the effectiveness of unsupervised generative data augmentation in NLP. Ultimately, our approach enables standardized adoption of generative data augmentation, which can be applied orthogonally to existing regularization techniques.졜근 λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ˜ κΈ‰κ²©ν•œ λ°œμ „μœΌλ‘œ 이λ₯Ό μ΄μš©ν•œ 생성 기반 데이터 증강 기법(generative data augmentation, GDA)의 μ‹€ν˜„ κ°€λŠ₯성에 λŒ€ν•œ κΈ°λŒ€κ°€ 컀지고 μžˆλ‹€. 생성 기반 데이터 증강 기법은 λ”₯λŸ¬λ‹ 기반 μž μž¬λ³€μˆ˜ λͺ¨λΈμ—μ„œ 생성 된 μƒ˜ν”Œμ„ 원본 데이터셋에 μΆ”κ°€ν•˜μ—¬ μ—°κ΄€λœ νƒœμŠ€ν¬μ˜ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€λŠ” κΈ°μˆ μ„ μ˜λ―Έν•œλ‹€. λ”°λΌμ„œ 생성 기반 데이터 증강 기법은 데이터 κ³΅κ°„μ—μ„œ μ΄λ€„μ§€λŠ” μ •κ·œν™” 기술의 ν•œ ν˜•νƒœλ‘œ 간주될 수 μžˆλ‹€. μ΄λŸ¬ν•œ λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ˜ μƒˆλ‘œμš΄ ν™œμš© κ°€λŠ₯성은 μžμ—°μ–΄μ²˜λ¦¬ λΆ„μ•Όμ—μ„œ λ”μš± μ€‘μš”ν•˜κ²Œ λΆ€κ°λ˜λŠ” μ΄μœ λŠ” (1) λ²”μš© κ°€λŠ₯ν•œ ν…μŠ€νŠΈ 데이터 증강 기술의 λΆ€μž¬μ™€ (2) ν…μŠ€νŠΈ λ°μ΄ν„°μ˜ ν¬μ†Œμ„±μ„ 극볡할 수 μžˆλŠ” λŒ€μ•ˆμ΄ ν•„μš”ν•˜κΈ° λ•Œλ¬Έμ΄λ‹€. 문제의 λ³΅μž‘λ„μ™€ νŠΉμ§•μ„ 골고루 μ±„μ§‘ν•˜κΈ° μœ„ν•΄ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν…μŠ€νŠΈ λΆ„λ₯˜(text classification), 순차적 λ ˆμ΄λΈ”λ§κ³Ό λ©€ν‹°νƒœμŠ€ν‚Ή 기술이 ν•„μš”ν•œ λ°œν™” 이해(spoken language understanding, SLU), 계측적이며 μž¬κ·€μ μΈ 데이터 ꡬ쑰에 λŒ€ν•œ κ³ λ €κ°€ ν•„μš”ν•œ λŒ€ν™” μƒνƒœ 좔적(dialogue state tracking, DST) λ“± μ„Έ 가지 λ¬Έμ œμ—μ„œ λ”₯λŸ¬λ‹ 기반 생성 λͺ¨λΈμ„ ν™œμš©ν•œ 데이터 증강 κΈ°λ²•μ˜ 타당성에 λŒ€ν•΄ 닀룬닀. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 쑰건뢀, 계측적 및 순차적 variational autoencoder (VAE)에 κΈ°λ°˜ν•˜μ—¬ 각 μžμ—°μ–΄μ²˜λ¦¬ λ¬Έμ œμ— νŠΉν™”λœ ν…μŠ€νŠΈ 및 μ—°κ΄€ λΆ€μ°© 정보λ₯Ό λ™μ‹œμ— μƒμ„±ν•˜λŠ” 특수 λ”₯λŸ¬λ‹ 생성 λͺ¨λΈλ“€μ„ μ œμ‹œν•˜κ³ , λ‹€μ–‘ν•œ ν•˜λ₯˜ λͺ¨λΈκ³Ό 데이터셋을 λ‹€λ£¨λŠ” λ“± 폭 넓은 μ‹€ν—˜μ„ 톡해 λ”₯ 생성 λͺ¨λΈ 기반 데이터 증강 κΈ°λ²•μ˜ 효과λ₯Ό ν†΅κ³„μ μœΌλ‘œ μž…μ¦ν•˜μ˜€λ‹€. λΆ€μˆ˜μ  μ—°κ΅¬μ—μ„œλŠ” μžκΈ°νšŒκ·€μ (autoregressive) VAEμ—μ„œ 빈번히 λ°œμƒν•˜λŠ” posterior collapse λ¬Έμ œμ— λŒ€ν•΄ νƒκ΅¬ν•˜κ³ , ν•΄λ‹Ή 문제λ₯Ό μ™„ν™”ν•  수 μžˆλŠ” μ‹ κ·œ λ°©μ•ˆλ„ μ œμ•ˆν•œλ‹€. ν•΄λ‹Ή 방법을 생성적 데이터 증강에 ν•„μš”ν•œ λ³΅μž‘ν•œ VAE λͺ¨λΈμ— μ μš©ν•˜μ˜€μ„ λ•Œ, 생성 λͺ¨λΈμ˜ 생성 질이 ν–₯μƒλ˜μ–΄ 데이터 증강 νš¨κ³Όμ—λ„ 긍정적인 영ν–₯을 λ―ΈμΉ  수 μžˆμŒμ„ κ²€μ¦ν•˜μ˜€λ‹€. λ³Έ 논문을 톡해 μžμ—°μ–΄μ²˜λ¦¬ λΆ„μ•Όμ—μ„œ κΈ°μ‘΄ μ •κ·œν™” 기법과 병행 적용 κ°€λŠ₯ν•œ 비지도 ν˜•νƒœμ˜ 데이터 증강 κΈ°λ²•μ˜ ν‘œμ€€ν™”λ₯Ό κΈ°λŒ€ν•΄ λ³Ό 수 μžˆλ‹€.1 Introduction 1 1.1 Motivation 1 1.2 Dissertation Overview 6 2 Background and Related Work 8 2.1 Deep Latent Variable Models 8 2.1.1 Variational Autoencoder (VAE) 10 2.1.2 Deep Generative Models and Text Generation 12 2.2 Data Augmentation 12 2.2.1 General Description 13 2.2.2 Categorization of Data Augmentation 14 2.2.3 Theoretical Explanations 21 2.3 Summary 24 3 Basic Task: Text Classi cation 25 3.1 Introduction 25 3.2 Our Approach 28 3.2.1 Proposed Models 28 3.2.2 Training with I-VAE 29 3.3 Experiments 31 3.3.1 Datasets 32 3.3.2 Experimental Settings 33 3.3.3 Implementation Details 34 3.3.4 Data Augmentation Results 36 3.3.5 Ablation Studies 39 3.3.6 Qualitative Analysis 40 3.4 Summary 45 4 Multi-task Learning: Spoken Language Understanding 46 4.1 Introduction 46 4.2 Related Work 48 4.3 Model Description 48 4.3.1 Framework Formulation 48 4.3.2 Joint Generative Model 49 4.4 Experiments 56 4.4.1 Datasets 56 4.4.2 Experimental Settings 57 4.4.3 Generative Data Augmentation Results 61 4.4.4 Comparison to Other State-of-the-art Results 63 4.4.5 Ablation Studies 63 4.5 Summary 67 5 Complex Data: Dialogue State Tracking 68 5.1 Introduction 68 5.2 Background and Related Work 70 5.2.1 Task-oriented Dialogue 70 5.2.2 Dialogue State Tracking 72 5.2.3 Conversation Modeling 72 5.3 Variational Hierarchical Dialogue Autoencoder (VHDA) 73 5.3.1 Notations 73 5.3.2 Variational Hierarchical Conversational RNN 74 5.3.3 Proposed Model 75 5.3.4 Posterior Collapse 82 5.4 Experimental Results 84 5.4.1 Experimental Settings 84 5.4.2 Data Augmentation Results 90 5.4.3 Intrinsic Evaluation - Language Evaluation 94 5.4.4 Qualitative Results 95 5.5 Summary 101 6 Conclusion 103 6.1 Summary 103 6.2 Limitations 104 6.3 Future Work 105Docto

    Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

    Full text link
    Deep Neural Networks (DNNs) have led to unprecedented progress in various natural language processing (NLP) tasks. Owing to limited data and computation resources, using third-party data and models has become a new paradigm for adapting various tasks. However, research shows that it has some potential security vulnerabilities because attackers can manipulate the training process and data source. Such a way can set specific triggers, making the model exhibit expected behaviors that have little inferior influence on the model's performance for primitive tasks, called backdoor attacks. Hence, it could have dire consequences, especially considering that the backdoor attack surfaces are broad. To get a precise grasp and understanding of this problem, a systematic and comprehensive review is required to confront various security challenges from different phases and attack purposes. Additionally, there is a dearth of analysis and comparison of the various emerging backdoor countermeasures in this situation. In this paper, we conduct a timely review of backdoor attacks and countermeasures to sound the red alarm for the NLP security community. According to the affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into three categorizations: attacking pre-trained model with fine-tuning (APMF) or prompt-tuning (APMP), and attacking final model with training (AFMT), where AFMT can be subdivided into different attack aims. Thus, attacks under each categorization are combed. The countermeasures are categorized into two general classes: sample inspection and model inspection. Overall, the research on the defense side is far behind the attack side, and there is no single defense that can prevent all types of backdoor attacks. An attacker can intelligently bypass existing defenses with a more invisible attack. ......Comment: 24 pages, 4 figure
    • …
    corecore