4 research outputs found

    Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions

    Get PDF
    Quantification of the stationary points and the associated basins of attraction of neural network loss surfaces is an important step towards a better understanding of neural network loss surfaces at large. This work proposes a novel method to visualise basins of attraction together with the associated stationary points via gradient-based random sampling. The proposed technique is used to perform an empirical study of the loss surfaces generated by two different error metrics: quadratic loss and entropic loss. The empirical observations confirm the theoretical hypothesis regarding the nature of neural network attraction basins. Entropic loss is shown to exhibit stronger gradients and fewer stationary points than quadratic loss, indicating that entropic loss has a more searchable landscape. Quadratic loss is shown to be more resilient to overfitting than entropic loss. Both losses are shown to exhibit local minima, but the number of local minima is shown to decrease with an increase in dimensionality. Thus, the proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.Comment: Preprint submitted to the Neural Networks journa

    Non-attracting Regions of Local Minima in Deep and Wide Neural Networks

    Full text link
    Understanding the loss surface of neural networks is essential for the design of models with predictable performance and their success in applications. Experimental results suggest that sufficiently deep and wide neural networks are not negatively impacted by suboptimal local minima. Despite recent progress, the reason for this outcome is not fully understood. Could deep networks have very few, if at all, suboptimal local optima? or could all of them be equally good? We provide a construction to show that suboptimal local minima (i.e., non-global ones), even though degenerate, exist for fully connected neural networks with sigmoid activation functions. The local minima obtained by our construction belong to a connected set of local solutions that can be escaped from via a non-increasing path on the loss curve. For extremely wide neural networks of decreasing width after the wide layer, we prove that every suboptimal local minimum belongs to such a connected set. This provides a partial explanation for the successful application of deep neural networks. In addition, we also characterize under what conditions the same construction leads to saddle points instead of local minima for deep neural networks

    심측 μ‹ κ²½λ§μ˜ μ†μ‹€ν‘œλ©΄ 및 λ”₯λŸ¬λ‹μ˜ μ—¬λŸ¬ μ μš©μ— κ΄€ν•œ 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : μžμ—°κ³Όν•™λŒ€ν•™ μˆ˜λ¦¬κ³Όν•™λΆ€, 2022. 8. κ°•λͺ…μ£Ό.λ³Έ ν•™μœ„ 논문은 심측 μ‹ κ²½λ§μ˜ 손싀 ν‘œλ©΄μ— λŒ€ν•˜μ—¬ 닀룬닀. 심측 μ‹ κ²½λ§μ˜ 손싀 ν•¨μˆ˜λŠ” 볼둝 ν•¨μˆ˜μ™€ 같이 λ‚˜μœ κ΅­μ†Œμ μ„ κ°€μ§€λŠ”κ°€? 쑰각적으둜 μ„ ν˜•μ€ ν™œμ„±ν•¨μˆ˜λ₯Ό κ°€μ§€λŠ” κ²½μš°μ— λŒ€ν•΄μ„œλŠ” 잘 μ•Œλ €μ˜€μ§€λ§Œ, 일반적인 λ§€λ„λŸ¬μš΄ ν™œμ„±ν•¨μˆ˜λ₯Ό κ°€μ§€λŠ” 심측 신경망에 λŒ€ν•΄μ„œλŠ” μ•„μ§κΉŒμ§€ μ•Œλ €μ§€μ§€ μ•Šμ€ 것이 λ§Žλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” λ‚˜μœ κ΅­μ†Œμ μ΄ 일반적인 λ§€λ„λŸ¬μš΄ ν™œμ„±ν•¨μˆ˜μ—μ„œλ„ μ‘΄μž¬ν•¨μ„ 보인닀. 이것은 심측 μ‹ κ²½λ§μ˜ 손싀 ν‘œλ©΄μ— λŒ€ν•œ 이해에 뢀뢄적인 μ„€λͺ…을 μ œκ³΅ν•΄ 쀄 것이닀. μΆ”κ°€μ μœΌλ‘œ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•™μŠ΅ 이둠, μ‚¬μƒν™œ 보호적인 기계 ν•™μŠ΅, 컴퓨터 λΉ„μ „ λ“±μ˜ λΆ„μ•Όμ—μ„œμ˜ 심측 μ‹ κ²½λ§μ˜ λ‹€μ–‘ν•œ μ‘μš©μ„ 선보일 μ˜ˆμ •μ΄λ‹€.In this thesis, we study the loss surface of deep neural networks. Does the loss function of deep neural network have no bad local minimum like the convex function? Although it is well known for piece-wise linear activations, not much is known for the general smooth activations. We explore that a bad local minimum also exists for general smooth activations. In addition, we characterize the types of such local minima. This provides a partial explanation for the understanding of the loss surface of deep neural networks. Additionally, we present several applications of deep neural networks in learning theory, private machine learning, and computer vision.Abstract v 1 Introduction 1 2 Existence of local minimum in neural network 4 2.1 Introduction 4 2.2 Local Minima and Deep Neural Network 6 2.2.1 Notation and Model 6 2.2.2 Local Minima and Deep Linear Network 6 2.2.3 Local Minima and Deep Neural Network with piece-wise linear activations 8 2.2.4 Local Minima and Deep Neural Network with smooth activations 10 2.2.5 Local Valley and Deep Neural Network 11 2.3 Existence of local minimum for partially linear activations 12 2.4 Absence of local minimum in the shallow network for small N 17 2.5 Existence of local minimum in the shallow network 20 2.6 Local Minimum Embedding 36 3 Self-Knowledge Distillation via Dropout 40 3.1 Introduction 40 3.2 Related work 43 3.2.1 Knowledge Distillation 43 3.2.2 Self-Knowledge Distillation 44 3.2.3 Semi-supervised and Self-supervised Learning 44 3.3 Self Distillation via Dropout 45 3.3.1 Method Formulation 46 3.3.2 Collaboration with other method 47 3.3.3 Forward versus reverse KL-Divergence 48 3.4 Experiments 53 3.4.1 Implementation Details 53 3.4.2 Results 54 3.5 Conclusion 62 4 Membership inference attacks against object detection models 63 4.1 Introduction 63 4.2 Background and Related Work 65 4.2.1 Membership Inference Attack 65 4.2.2 Object Detection 66 4.2.3 Datasets 67 4.3 Attack Methodology 67 4.3.1 Motivation 69 4.3.2 Gradient Tree Boosting 69 4.3.3 Convolutional Neural Network Based Method 70 4.3.4 Transfer Attack 73 4.4 Defense 73 4.4.1 Dropout 73 4.4.2 Diff erentially Private Algorithm 74 4.5 Experiments 75 4.5.1 Target and Shadow Model Setup 75 4.5.2 Attack Model Setup 77 4.5.3 Experiment Results 78 4.5.4 Transfer Attacks 80 4.5.5 Defense 81 4.6 Conclusion 81 5 Single Image Deraining 82 5.1 Introduction 82 5.2 Related Work 86 5.3 Proposed Network 89 5.3.1 Multi-Level Connection 89 5.3.2 Wide Regional Non-Local Block 92 5.3.3 Discrete Wavelet Transform 94 5.3.4 Loss Function 94 5.4 Experiments 95 5.4.1 Datasets and Evaluation Metrics 95 5.4.2 Datasets and Experiment Details 96 5.4.3 Evaluations 97 5.4.4 Ablation Study 104 5.4.5 Applications for Other Tasks 107 5.4.6 Analysis on multi-level features 109 5.5 Conclusion 111 The bibliography 112 Abstract (in Korean) 129λ°•
    corecore