1,002 research outputs found
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
Reinforcement learning in continuous state- and action-space
Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values.
This can be overcome with the use of function approximation techniques with generalisation capability, such as artificial neural networks, to store the value function. When this is applied we can select the optimal action by comparing the values of each possible action; however, when the action-space is continuous this is not possible.
In this thesis we investigate methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques. Although it has been stated in the literature that
gradient-ascent methods can be applied to the action selection [47], it is also stated that solving this problem would be infeasible, and therefore, is claimed that it is necessary to utilise a second artificial neural network to approximate the policy function [21, 55].
The major contributions of this thesis include the investigation of the applicability of action selection by numerical optimization methods, including gradient-ascent along with other derivative-based and derivative-free numerical optimization methods,and the proposal of two novel algorithms which are based on the application of two alternative action selection methods: NM-SARSA [40] and NelderMead-SARSA.
We empirically compare the proposed methods to state-of-the-art methods from the literature on three continuous state- and action-space control benchmark problems
from the literature: minimum-time full swing-up of the Acrobot; Cart-Pole balancing problem; and a double pole variant. We also present novel results from the application of the existing direct policy search method genetic programming to the Acrobot
benchmark problem [12, 14]
MaxMin-L2-SVC-NCH: A New Method to Train Support Vector Classifier with the Selection of Model's Parameters
The selection of model's parameters plays an important role in the
application of support vector classification (SVC). The commonly used method of
selecting model's parameters is the k-fold cross validation with grid search
(CV). It is extremely time-consuming because it needs to train a large number
of SVC models. In this paper, a new method is proposed to train SVC with the
selection of model's parameters. Firstly, training SVC with the selection of
model's parameters is modeled as a minimax optimization problem
(MaxMin-L2-SVC-NCH), in which the minimization problem is an optimization
problem of finding the closest points between two normal convex hulls
(L2-SVC-NCH) while the maximization problem is an optimization problem of
finding the optimal model's parameters. A lower time complexity can be expected
in MaxMin-L2-SVC-NCH because CV is abandoned. A gradient-based algorithm is
then proposed to solve MaxMin-L2-SVC-NCH, in which L2-SVC-NCH is solved by a
projected gradient algorithm (PGA) while the maximization problem is solved by
a gradient ascent algorithm with dynamic learning rate. To demonstrate the
advantages of the PGA in solving L2-SVC-NCH, we carry out a comparison of the
PGA and the famous sequential minimal optimization (SMO) algorithm after a SMO
algorithm and some KKT conditions for L2-SVC-NCH are provided. It is revealed
that the SMO algorithm is a special case of the PGA. Thus, the PGA can provide
more flexibility. The comparative experiments between MaxMin-L2-SVC-NCH and the
classical parameter selection models on public datasets show that
MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained and the
test accuracy is not lost to the classical models. It indicates that
MaxMin-L2-SVC-NCH performs better than the other models. We strongly recommend
MaxMin-L2-SVC-NCH as a preferred model for SVC task
Variance Reduction for Faster Non-Convex Optimization
We consider the fundamental problem in non-convex optimization of efficiently
reaching a stationary point. In contrast to the convex case, in the long
history of this basic problem, the only known theoretical results on
first-order non-convex optimization remain to be full gradient descent that
converges in iterations for smooth objectives, and
stochastic gradient descent that converges in iterations
for objectives that are sum of smooth functions.
We provide the first improvement in this line of research. Our result is
based on the variance reduction trick recently introduced to convex
optimization, as well as a brand new analysis of variance reduction that is
suitable for non-convex optimization. For objectives that are sum of smooth
functions, our first-order minibatch stochastic method converges with an
rate, and is faster than full gradient descent by
.
We demonstrate the effectiveness of our methods on empirical risk
minimizations with non-convex loss functions and training neural nets.Comment: polished writin
μμ€ν¨μ νμμ ν΅ν λ₯λ¬λ λͺ¨λΈμ κ°κ±΄μ±κ³Ό μΌλ°ν ν₯μ
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : 곡과λν μ°μ
곡νκ³Ό, 2023. 2. μ΄μ¬μ±.λ₯λ¬λμ λ€μν λΆμΌμμ λ°μ΄λ μ±λ₯ν₯μμ 보μ΄λ©°, μμ± μΈμ, μμ¨μ£Όν λ° μλ£ μ°μ
λ± λ§μ λΆμΌμ νμ©λκ³ μλ€. λ₯λ¬λ λͺ¨λΈμ μλ§μ κ°μ€μΉλ₯Ό κΈ°λ°μΌλ‘, μ£Όμ΄μ§ νμ΅ λ°μ΄ν°μ λν μμ€ν¨μλ₯Ό μ€μ΄λλ‘ νμ΅λλ€. κ·Έλ¬λ, μ΅κ·Ό νμ΅ λ°μ΄ν°μ λν λ§Ήλͺ©μ μΈ μμ€ν¨μμ μ΅μνλ ν¬κ² λ κ°μ§μ λ
Όμμ μ΄ μμμ΄ λ°νμ‘λ€. 첫 λ²μ§Έ λ
Όμμ μ λ₯λ¬λ λͺ¨λΈμ κ°κ±΄μ±μ΄λ€. κ°κ±΄μ±μ΄λ λ₯λ¬λ λͺ¨λΈμ μ λμ 곡격μ λν λ°©μ΄ λ₯λ ₯μ λ§νλ€. μ λμ 곡격μ νμ΅λ λ₯λ¬λ λͺ¨λΈμ κ°μ€μΉμ κΈ° μΈκΈ° μ 보 λ±μ νμ©νμ¬ λΉμ μμ μΈ λ°μ΄ν°λ₯Ό λ§λ€μ΄λ΄λ λ°©λ²μΌλ‘, λ₯λ¬λ λͺ¨λΈμ μ±λ₯μ νμ νκ² μ νμν¨λ€. νμ¬κΉμ§ λ°νμ§ λ°λ‘λ μμ£Ό μμ ν¬κΈ°μ μλλ λΉμ μ λ°μ΄ν°λ₯Ό μμ±νκΈ°μ μΆ©λΆνμ¬, μ¬λμκ²λ μ μ λ°μ΄ν°λ‘ μΈμλλ λ₯λ¬λ λͺ¨λΈμ μΉ λͺ
μ μΌλ‘ μ€μλνλ μ λμ μμ λ₯Ό μ½κ² λ§λ€ μ μλ€. λ°λΌμ λ₯λ¬λ λͺ¨λΈμ μμ ν μμ©νλ₯Ό μν΄ κ°κ±΄μ±μ νμμ μΌλ‘ μ°κ΅¬λμ΄μΌ ν μμμ΄λ€. λ λ²μ§Έ λ
Όμμ μ λ₯λ¬λ λͺ¨λΈμ μΌλ°νμ΄λ€. μΌλ°νλ λ₯λ¬λ λͺ¨λΈμ νμ΅ λ°μ΄ν° μ λν μ±λ₯κ³Ό νκ° λ°μ΄ν°μ λν μ±λ₯μ μ°¨μ΄λ₯Ό μλ―Ένλ€. μ°¨μ΄κ° μμμλ‘ μΌλ°ν μ±λ₯μ΄ λμΌλ©°, μ΄λ 곧 λ₯λ¬λ λͺ¨λΈμ λμ μμ©ν κ°λ₯μ±μ λ΄ν¬νλ€. κ·Έλ¬λ νμ΅ λ°μ΄ν°μ λν μμ€ν¨μλ§μ μ€μ΄λ νμ΅ λ°©λ²μ νμ΅ λ°μ΄ν°μ λν κ³Όμ ν© νμμ λΆλ¬μ€λ©°, μ΄λ 곧 νκ° λ°μ΄ν°μ λν μ±λ₯ κ°μλ‘ μ΄μ΄μ§μ΄ μ¬λ¬ μ ν μ°κ΅¬μ μν΄ λ°νμ§ λ° μλ€. λ₯λ¬λ λͺ¨λΈμ μ±λ₯ ν₯μμ νμ΅ λ°μ΄ν°κ° μλ νκ° λ°μ΄ν°μ λν΄ νλ¨λλ―λ‘, μΌλ°ν μ±λ₯μ λ¬μ±μ λͺ¨λ λ₯λ¬λ λͺ¨λΈμ κΆκ·Ήμ μΈ λͺ©νλΌκ³ ν μ μλ€. λ³Έ μ°κ΅¬μμλ μμ€ν¨μνλ©΄μ νμμ ν΅ν΄ λ λ
Όμμ μ λν λΆμκ³Ό κ° λ
Όμμ μ λμνλ μ§νλ₯Ό ν₯μμν¬ μ μλ νμ΅ λ°©λ²μ μ μνλ€. μ°μ , κ°κ±΄μ±μ μ΄ν΄μ ν₯μ μ μν΄ μ
λ ₯κ°μ λν μμ€ν¨μλ₯Ό λΆμνλ€. μ λμ 곡격μ μ
λ ₯κ°μ λν΄ μμ€ν¨μλ₯Ό μ΅λννλ μλμ μμ±νλ―λ‘, λΉμ μμ μΈ μλμ΄ λν΄μ§ μ
λ ₯κ°μ λν΄μ μμ€ν¨μ λ₯Ό μ΅μνν μ μλ λ°©μ΄ λ°©λ²μ λν΄ μ°κ΅¬νλ€. κ·Έ μμμΌλ‘, μ λμ λ°©μ΄ κΈ°λ²μ νλμΈ λ¨μΌ λ¨κ³ μ λμ νμ΅μμ μμ€ν¨μνλ©΄μ΄ μ½κ² λ€ν릴 μ μμμ λ°νλΈλ€. μ μλ μ°κ΅¬μμ λ€νλ¦° μμ€ν¨μνλ©΄μ΄ λͺ¨λΈμ κ°κ±΄μ±μ μ¬κ°νκ² μμν μ μμμ 보μ΄κ³ , μ΄λ₯Ό κΈ°λ°μΌλ‘ 맀λλ¬μ΄ μμ€ν¨μλ₯Ό κ°λ κ²μ μ€μμ±μ μ¦λͺ
νλ€. μμ€ν¨μ νλ©΄μ νΉμ±μ κΈ°λ°μΌλ‘ λ€μν μμμμμ μ λμ 곡격과 λ°©μ΄ κΈ°λ²μ λν λΆμκ³Ό μ±λ₯ ν₯μμ μ°κ΅¬νλ€. 첫 λ²μ§Έλ‘, ꡬ쑰λ κ°μ€μΉκ° μμ΄ν λͺ¨λΈμμ μ λμ μμ λ₯Ό μ μ±νμ¬ λμ λͺ¨λΈλ‘ 곡격νλ μ μ΄ κ³΅κ²©μ μΈκΈ°κ° μμ€ν¨μνλ©΄κ³Ό κΉμ΄ κ΄λ ¨μ΄ μμμ μ¦λͺ
νλ€. μ΄λ₯Ό κΈ°λ°μΌλ‘ κ°λ ₯ν μ λμ μ리 μμ λ₯Ό μμ±νκ³ , λ₯λ¬λ λͺ¨λΈμ μ λ’°ν μ μλ κ°κ±΄μ± μμ€μ μ μνλ€. μ΄μ΄ μ λμ νμ΅μ νΉμ§κ³Ό νμ΅λ λͺ¨λΈμ μμ€ν¨μν λ©΄μ νμνλ€. μ
λ ₯κ°μ λν μμ€ν¨μνλ©΄μ λΆλλ½κ² λ§λ€κΈ° μνμ¬, μ λμ νμ΅μ μ€μμ μ κ³ λ €ν μμ€ν¨μλ₯Ό λμ
νμ¬ λͺ¨λΈμ κ°κ±΄μ±μ λμΈλ€. λ€μμΌλ‘, μΌλ°νμ μ΄ν΄μ ν₯μμ μν΄ κ°μ€μΉμ λν μμ€ν¨μλ₯Ό λΆμνλ€. μ΅κ·Ό μΌλ ¨μ μ°κ΅¬μμλ λ₯λ¬λ λͺ¨λΈμ μΌλ°ν μ±λ₯μ μμ€ν¨μνλ©΄μ ννν¨κ³Ό κΈ΄λ°νκ² μ°κ²°λμ΄ μμμ΄ μ¦λͺ
λ λ° μλ€. μ΄λ₯Ό κΈ°λ°μΌλ‘ μ μλ 첨μ κΈ°λ° νμ΅μ 첨μν μ΅μ μ μ κΈ°νΌνκ³ ννν μ΅μ μ μ μ°ΎμμΌλ‘μ¨ λμ μΌλ°ν μ±λ₯μ λ¬μ±νλ€. λ³Έ μ°κ΅¬μμλ 첨μ κΈ°λ° νμ΅ λ°©λ²μ μμ€ν¨μνλ©΄μ λν λΆμμ μ§ννλ€. μ°μ 첨μ κΈ°λ° νμ΅μ΄ μμ€ν¨μνλ©΄μ μμ₯μ μ΄ μ‘΄μ¬ν κ²½μ° μλ ΄μ΄ λΆμμ ν¨μ λ°νλ€. λΆμμ ν μλ ΄ λ λ¬Έμ μ΅μ μ μ΄ μλ μμ₯μ μ κ°νλ κ²½μ°κ° λ°μνλ©°, μ΄λ 첨μ κΈ°λ° νμ΅μ μ±λ₯μ μ ν΄ν¨μ 보μΈλ€. λΆμμ ν μλ ΄μ κ°μ νκ³ λ λμ μΌλ°ν μ±λ₯μ λ¬μ±νκΈ° μν΄, κ°μ€μΉ 곡κ°μμμ μλμ ꡬνλ λ¨κ³μμ λμΆλλ λͺ¨λ μ€μμ μ κΈ°μΈκΈ° μ 보λ₯Ό νμ©νλ λ°©λ²μ μ μνλ€. λ³Έ μ°κ΅¬λ μμ€ν¨μνλ©΄μ λν νμκ³Ό κ³ μ°°μ λ°νμΌλ‘ κ°κ±΄μ±κ³Ό μΌλ°νμ λν λ κΉμ μ΄ν΄λ₯Ό μ μνκ³ , μ΄λ₯Ό ν΅ν΄μ κ° μ§νμ ν₯μμ μν μλ‘μ΄ μ λμ 곡격 λ°©λ², μ λμ λ°©μ΄ λ°©λ², 첨μ κΈ°λ° νμ΅ λ°©λ²μ μ μνμλ€. μ°κ΅¬ κ²°κ³Όλ ν₯ν λ₯λ¬λ λͺ¨λΈμ μ€νμ μν μΆν μ°κ΅¬μ νμ₯μ± μλ λͺ¨λΈμ΄λ©°, κ°κ±΄μ±κ³Ό μΌλ°νμ μμ΄ μμ€ν¨μν λ©΄μ λν μ¬λ μλ λΆμμ΄ μ νλμ΄μΌ νλ€λ ν¨μμ μ μ 곡νλ€.Recent advances in deep learning have demonstrated significant performance improvements in various domains, such as computer vision and speech recognition, yielding numerous industrial applications. Compared to other machine learning models, deep learning models have a large number of parameters and this brings near zero training loss that was previously considered impossible. To train these overparameterized models, we generally minimize the loss on training data, which we call empirical risk minimization (ERM). However, recent studies have demonstrated that these deep learning models trained by ERM may suffer from two major problems: adversarial vulnerability and poor generalization. Adversarial vulnerability is an intriguing property of deep learning models that makes them susceptible to adversarial attacks that create malicious examples with slight modifications (Szegedy et al., 2013; Goodfellow et al., 2014). Prior studies have also confirmed that there exist the potential risks of deep learning models in real-world applications (Papernot et al., 2017; Kurakin et al., 2016). Adversarial attacks entail severe hazards in real-world applications, e.g., causing autonomous vehicle accidents by manipulating decision-making or extracting private information by circumventing voice authorization. Thus, to prevent these malicious cases arisen from the existence of adversarial attacks, many researchers proposed various methods to enhance the robustness of deep learning models against adversarial attacks. Poor generalization, another issue with current deep learning models, is a large discrepancy between training accuracy and test accuracy. In other words, existing methods can successfully minimize loss on train datasets, but this does not guarantee high performance on test datasets (Ishida et al., 2020; Foret et al., 2020). To achieve an ideal performance over various domains, improving the generalization of neural networks has been a core challenge in deep learning. In this dissertation, focusing on the fact that both robustness and generalization are heavily related to the loss landscape, we aim to gain a deeper understanding of adversarial robustness and generalization performance of deep learning models by analyzing their loss landscape. First, we investigate the adversarial robustness with respect to its loss landscape. Through analyzing the loss landscape of adversarially trained models, we discover that the distortion of the loss landscape can occur, resulting in poor adversarial robustness. Based on this observation, we extend the loss landscape analysis to adversarial attacks and defenses to improve the adversarial robustness of deep learning models. We further analyze sharpness-aware minimization with its loss landscape and reveal that there exists a convergence instability problem due to its inherent algorithm. Specifically, whether the loss landscape in the parameter space has a saddle point can heavily affect the optimization and its generalization performance. Given this phenomenon, we investigate the loss landscape with respect to perturbation in the parameter space and improve generalization performance by exploring a wider loss landscape.Chapter 1 Introduction 1
1.1 Motivation of the Dissertation 1
1.2 Aims of the Dissertation 4
1.3 Organization of the Dissertation 6
Chapter 2 Adversarial Robustness and Loss Landscape 8
2.1 Chapter Overview 8
2.2 Preliminaries 11
2.2.1 Adversarial Robustness 11
2.2.2 Single-step and Multi-step Adversarial Attack 12
2.2.3 Catastrophic Overfitting 13
2.3 Methodology 15
2.3.1 Revisiting Catastrophic Overfitting 15
2.3.2 Stable Single-Step Adversarial Training 19
2.4 Experiments . 24
2.4.1 Experimental Setup 24
2.4.2 Visualizing Decision Boundary Distortion 27
2.4.3 Distortion and Nonlinearity of the Loss Function 31
2.4.4 Adversarial Robustness 33
2.5 Chapter Summary 35
Chapter 3 Geometry-Aware Adversarial Attack and Defense 36
3.1 Chapter Overview 36
3.2 Preliminaries 37
3.2.1 Adversarial Attack 37
3.2.2 Adversarial Defense 41
3.3 Methodology 43
3.3.1 Transferable Adversarial Examples 43
3.3.2 Improved Adversarial Training 55
3.4 Experiments . 68
3.4.1 Transferability 68
3.4.2 Adversarial Robustness 74
3.5 Chapter Summary 85
Chapter 4 Generalization and Loss Landscape 86
4.1 Chapter Overview 86
4.2 Preliminaries 89
4.2.1 Generliazation and Sharpness-Aware Minimization 89
4.2.2 Escaping Saddle Points 91
4.3 Methodology 92
4.3.1 Asymptotic Behavior of SAM Dynamics 92
4.3.2 Saddle Point Becomes Attractor in SAM Dynamics 97
4.4 Experiments . 101
4.4.1 Stochastic Behavior of SAM Dynamics 101
4.4.2 Convergence Instability and Training Tricks 107
4.5 Chapter Summary 111
Chapter 5 Sharpness-Aware Minimization with Multi-Ascent 113
5.1 Chapter Overview 113
5.2 Preliminaries 115
5.3 Methodology 118
5.3.1 Revisiting Number of Ascent Steps in SAM 118
5.3.2 Multi-ascent Sharpness-Aware Minimization 122
5.4 Experiments . 125
5.4.1 Experimental Setup 125
5.4.2 Generalization Performance 126
5.4.3 Escaping Local Minima 127
5.5 Chapter Summary 128
Chapter 6 Conclusion 129
6.1 Contributions 129
6.2 Future Work 130
Bibliography 131
κ΅λ¬Έμ΄λ‘ 171λ°
- β¦