4 research outputs found
Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions
Quantification of the stationary points and the associated basins of
attraction of neural network loss surfaces is an important step towards a
better understanding of neural network loss surfaces at large. This work
proposes a novel method to visualise basins of attraction together with the
associated stationary points via gradient-based random sampling. The proposed
technique is used to perform an empirical study of the loss surfaces generated
by two different error metrics: quadratic loss and entropic loss. The empirical
observations confirm the theoretical hypothesis regarding the nature of neural
network attraction basins. Entropic loss is shown to exhibit stronger gradients
and fewer stationary points than quadratic loss, indicating that entropic loss
has a more searchable landscape. Quadratic loss is shown to be more resilient
to overfitting than entropic loss. Both losses are shown to exhibit local
minima, but the number of local minima is shown to decrease with an increase in
dimensionality. Thus, the proposed visualisation technique successfully
captures the local minima properties exhibited by the neural network loss
surfaces, and can be used for the purpose of fitness landscape analysis of
neural networks.Comment: Preprint submitted to the Neural Networks journa
Non-attracting Regions of Local Minima in Deep and Wide Neural Networks
Understanding the loss surface of neural networks is essential for the design
of models with predictable performance and their success in applications.
Experimental results suggest that sufficiently deep and wide neural networks
are not negatively impacted by suboptimal local minima. Despite recent
progress, the reason for this outcome is not fully understood. Could deep
networks have very few, if at all, suboptimal local optima? or could all of
them be equally good? We provide a construction to show that suboptimal local
minima (i.e., non-global ones), even though degenerate, exist for fully
connected neural networks with sigmoid activation functions. The local minima
obtained by our construction belong to a connected set of local solutions that
can be escaped from via a non-increasing path on the loss curve. For extremely
wide neural networks of decreasing width after the wide layer, we prove that
every suboptimal local minimum belongs to such a connected set. This provides a
partial explanation for the successful application of deep neural networks. In
addition, we also characterize under what conditions the same construction
leads to saddle points instead of local minima for deep neural networks
μ¬μΈ΅ μ κ²½λ§μ μμ€νλ©΄ λ° λ₯λ¬λμ μ¬λ¬ μ μ©μ κ΄ν μ°κ΅¬
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : μμ°κ³Όνλν μ리과νλΆ, 2022. 8. κ°λͺ
μ£Ό.λ³Έ νμ λ
Όλ¬Έμ μ¬μΈ΅ μ κ²½λ§μ μμ€ νλ©΄μ λνμ¬ λ€λ£¬λ€. μ¬μΈ΅ μ κ²½λ§μ μμ€ ν¨μλ λ³Όλ‘ ν¨μμ κ°μ΄ λμ κ΅μμ μ κ°μ§λκ°? μ‘°κ°μ μΌλ‘ μ νμ νμ±ν¨μλ₯Ό κ°μ§λ κ²½μ°μ λν΄μλ μ μλ €μμ§λ§, μΌλ°μ μΈ λ§€λλ¬μ΄ νμ±ν¨μλ₯Ό κ°μ§λ μ¬μΈ΅ μ κ²½λ§μ λν΄μλ μμ§κΉμ§ μλ €μ§μ§ μμ κ²μ΄ λ§λ€. λ³Έ μ°κ΅¬μμλ λμ κ΅μμ μ΄ μΌλ°μ μΈ λ§€λλ¬μ΄ νμ±ν¨μμμλ μ‘΄μ¬ν¨μ 보μΈλ€. μ΄κ²μ μ¬μΈ΅ μ κ²½λ§μ μμ€ νλ©΄μ λν μ΄ν΄μ λΆλΆμ μΈ μ€λͺ
μ μ κ³΅ν΄ μ€ κ²μ΄λ€. μΆκ°μ μΌλ‘ λ³Έ λ
Όλ¬Έμμλ νμ΅ μ΄λ‘ , μ¬μν 보νΈμ μΈ κΈ°κ³ νμ΅, μ»΄ν¨ν° λΉμ λ±μ λΆμΌμμμ μ¬μΈ΅ μ κ²½λ§μ λ€μν μμ©μ μ λ³΄μΌ μμ μ΄λ€.In this thesis, we study the loss surface of deep neural networks. Does the loss function of deep neural network have no bad local minimum like the convex function? Although it is well known for piece-wise linear activations, not much is known for the general smooth activations. We explore that a bad local minimum also exists for general smooth activations. In addition, we characterize the types of such local minima. This provides a partial explanation for the understanding of the loss surface of deep neural networks. Additionally, we present several applications of deep neural networks in learning theory, private machine learning, and computer vision.Abstract v
1 Introduction 1
2 Existence of local minimum in neural network 4
2.1 Introduction 4
2.2 Local Minima and Deep Neural Network 6
2.2.1 Notation and Model 6
2.2.2 Local Minima and Deep Linear Network 6
2.2.3 Local Minima and Deep Neural Network with piece-wise linear activations 8
2.2.4 Local Minima and Deep Neural Network with smooth activations 10
2.2.5 Local Valley and Deep Neural Network 11
2.3 Existence of local minimum for partially linear activations 12
2.4 Absence of local minimum in the shallow network for small N 17
2.5 Existence of local minimum in the shallow network 20
2.6 Local Minimum Embedding 36
3 Self-Knowledge Distillation via Dropout 40
3.1 Introduction 40
3.2 Related work 43
3.2.1 Knowledge Distillation 43
3.2.2 Self-Knowledge Distillation 44
3.2.3 Semi-supervised and Self-supervised Learning 44
3.3 Self Distillation via Dropout 45
3.3.1 Method Formulation 46
3.3.2 Collaboration with other method 47
3.3.3 Forward versus reverse KL-Divergence 48
3.4 Experiments 53
3.4.1 Implementation Details 53
3.4.2 Results 54
3.5 Conclusion 62
4 Membership inference attacks against object detection models 63
4.1 Introduction 63
4.2 Background and Related Work 65
4.2.1 Membership Inference Attack 65
4.2.2 Object Detection 66
4.2.3 Datasets 67
4.3 Attack Methodology 67
4.3.1 Motivation 69
4.3.2 Gradient Tree Boosting 69
4.3.3 Convolutional Neural Network Based Method 70
4.3.4 Transfer Attack 73
4.4 Defense 73
4.4.1 Dropout 73
4.4.2 Diff erentially Private Algorithm 74
4.5 Experiments 75
4.5.1 Target and Shadow Model Setup 75
4.5.2 Attack Model Setup 77
4.5.3 Experiment Results 78
4.5.4 Transfer Attacks 80
4.5.5 Defense 81
4.6 Conclusion 81
5 Single Image Deraining 82
5.1 Introduction 82
5.2 Related Work 86
5.3 Proposed Network 89
5.3.1 Multi-Level Connection 89
5.3.2 Wide Regional Non-Local Block 92
5.3.3 Discrete Wavelet Transform 94
5.3.4 Loss Function 94
5.4 Experiments 95
5.4.1 Datasets and Evaluation Metrics 95
5.4.2 Datasets and Experiment Details 96
5.4.3 Evaluations 97
5.4.4 Ablation Study 104
5.4.5 Applications for Other Tasks 107
5.4.6 Analysis on multi-level features 109
5.5 Conclusion 111
The bibliography 112
Abstract (in Korean) 129λ°