1,002 research outputs found

    Reinforcement learning in continuous state- and action-space

    Get PDF
    Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values. This can be overcome with the use of function approximation techniques with generalisation capability, such as artificial neural networks, to store the value function. When this is applied we can select the optimal action by comparing the values of each possible action; however, when the action-space is continuous this is not possible. In this thesis we investigate methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques. Although it has been stated in the literature that gradient-ascent methods can be applied to the action selection [47], it is also stated that solving this problem would be infeasible, and therefore, is claimed that it is necessary to utilise a second artificial neural network to approximate the policy function [21, 55]. The major contributions of this thesis include the investigation of the applicability of action selection by numerical optimization methods, including gradient-ascent along with other derivative-based and derivative-free numerical optimization methods,and the proposal of two novel algorithms which are based on the application of two alternative action selection methods: NM-SARSA [40] and NelderMead-SARSA. We empirically compare the proposed methods to state-of-the-art methods from the literature on three continuous state- and action-space control benchmark problems from the literature: minimum-time full swing-up of the Acrobot; Cart-Pole balancing problem; and a double pole variant. We also present novel results from the application of the existing direct policy search method genetic programming to the Acrobot benchmark problem [12, 14]

    MaxMin-L2-SVC-NCH: A New Method to Train Support Vector Classifier with the Selection of Model's Parameters

    Full text link
    The selection of model's parameters plays an important role in the application of support vector classification (SVC). The commonly used method of selecting model's parameters is the k-fold cross validation with grid search (CV). It is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new method is proposed to train SVC with the selection of model's parameters. Firstly, training SVC with the selection of model's parameters is modeled as a minimax optimization problem (MaxMin-L2-SVC-NCH), in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal model's parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is abandoned. A gradient-based algorithm is then proposed to solve MaxMin-L2-SVC-NCH, in which L2-SVC-NCH is solved by a projected gradient algorithm (PGA) while the maximization problem is solved by a gradient ascent algorithm with dynamic learning rate. To demonstrate the advantages of the PGA in solving L2-SVC-NCH, we carry out a comparison of the PGA and the famous sequential minimal optimization (SMO) algorithm after a SMO algorithm and some KKT conditions for L2-SVC-NCH are provided. It is revealed that the SMO algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility. The comparative experiments between MaxMin-L2-SVC-NCH and the classical parameter selection models on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained and the test accuracy is not lost to the classical models. It indicates that MaxMin-L2-SVC-NCH performs better than the other models. We strongly recommend MaxMin-L2-SVC-NCH as a preferred model for SVC task

    Variance Reduction for Faster Non-Convex Optimization

    Full text link
    We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in O(1/Ξ΅)O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/Ξ΅2)O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an O(1/Ξ΅)O(1/\varepsilon) rate, and is faster than full gradient descent by Ξ©(n1/3)\Omega(n^{1/3}). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.Comment: polished writin

    μ†μ‹€ν•¨μˆ˜ 탐색을 ν†΅ν•œ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ 강건성과 μΌλ°˜ν™” ν–₯상

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 산업곡학과, 2023. 2. 이재욱.λ”₯λŸ¬λ‹μ€ λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ λ›°μ–΄λ‚œ μ„±λŠ₯ν–₯상을 보이며, μŒμ„± 인식, μžμœ¨μ£Όν–‰ 및 의료 μ‚°μ—… λ“± λ§Žμ€ 뢄야에 ν™œμš©λ˜κ³  μžˆλ‹€. λ”₯λŸ¬λ‹ λͺ¨λΈμ€ μˆ˜λ§Žμ€ κ°€μ€‘μΉ˜λ₯Ό 기반으둜, 주어진 ν•™μŠ΅ 데이터에 λŒ€ν•œ μ†μ‹€ν•¨μˆ˜λ₯Ό 쀄이도둝 ν•™μŠ΅λœλ‹€. κ·ΈλŸ¬λ‚˜, 졜근 ν•™μŠ΅ 데이터에 λŒ€ν•œ λ§Ήλͺ©μ μΈ μ†μ‹€ν•¨μˆ˜μ˜ μ΅œμ†Œν™”λŠ” 크게 두 κ°€μ§€μ˜ λ…Όμ˜μ μ΄ 있음이 λ°ν˜€μ‘Œλ‹€. 첫 번째 λ…Όμ˜μ μ€ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ 강건성이닀. κ°•κ±΄μ„±μ΄λž€ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μ λŒ€μ  곡격에 λŒ€ν•œ λ°©μ–΄ λŠ₯λ ₯을 λ§ν•œλ‹€. μ λŒ€μ  곡격은 ν•™μŠ΅λœ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ κ°€μ€‘μΉ˜μ™€ κΈ° 울기 정보 등을 ν™œμš©ν•˜μ—¬ 비정상적인 데이터λ₯Ό λ§Œλ“€μ–΄λ‚΄λŠ” λ°©λ²•μœΌλ‘œ, λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μ„±λŠ₯을 ν˜„μ €ν•˜κ²Œ μ €ν•˜μ‹œν‚¨λ‹€. ν˜„μž¬κΉŒμ§€ λ°ν˜€μ§„ λ°”λ‘œλŠ” μ•„μ£Ό μž‘μ€ 크기의 섭동도 비정상 데이터λ₯Ό μƒμ„±ν•˜κΈ°μ— μΆ©λΆ„ν•˜μ—¬, μ‚¬λžŒμ—κ²ŒλŠ” 정상 λ°μ΄ν„°λ‘œ μΈμ‹λ˜λ‚˜ λ”₯λŸ¬λ‹ λͺ¨λΈμ€ 치 λͺ…μ μœΌλ‘œ μ˜€μž‘λ™ν•˜λŠ” μ λŒ€μ  예제λ₯Ό μ‰½κ²Œ λ§Œλ“€ 수 μžˆλ‹€. λ”°λΌμ„œ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μ•ˆμ „ν•œ μƒμš©ν™”λ₯Ό μœ„ν•΄ 강건성은 ν•„μˆ˜μ μœΌλ‘œ μ—°κ΅¬λ˜μ–΄μ•Ό ν•  μš”μ†Œμ΄λ‹€. 두 번째 λ…Όμ˜μ μ€ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μΌλ°˜ν™”μ΄λ‹€. μΌλ°˜ν™”λž€ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ ν•™μŠ΅ 데이터 에 λŒ€ν•œ μ„±λŠ₯κ³Ό 평가 데이터에 λŒ€ν•œ μ„±λŠ₯의 차이λ₯Ό μ˜λ―Έν•œλ‹€. 차이가 μž‘μ„μˆ˜λ‘ μΌλ°˜ν™” μ„±λŠ₯이 λ†’μœΌλ©°, μ΄λŠ” 곧 λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ 높은 μƒμš©ν™” κ°€λŠ₯성을 λ‚΄ν¬ν•œλ‹€. κ·ΈλŸ¬λ‚˜ ν•™μŠ΅ 데이터에 λŒ€ν•œ μ†μ‹€ν•¨μˆ˜λ§Œμ„ μ€„μ΄λŠ” ν•™μŠ΅ 방법은 ν•™μŠ΅ 데이터에 λŒ€ν•œ 과적합 ν˜„μƒμ„ 뢈러였며, μ΄λŠ” 곧 평가 데이터에 λŒ€ν•œ μ„±λŠ₯ κ°μ†Œλ‘œ 이어짐이 μ—¬λŸ¬ μ„ ν–‰ 연ꡬ에 μ˜ν•΄ λ°ν˜€μ§„ λ°” μžˆλ‹€. λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μ„±λŠ₯ ν–₯상은 ν•™μŠ΅ 데이터가 μ•„λ‹Œ 평가 데이터에 λŒ€ν•΄ νŒλ‹¨λ˜λ―€λ‘œ, μΌλ°˜ν™” μ„±λŠ₯의 달성은 λͺ¨λ“  λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ ꢁ극적인 λͺ©ν‘œλΌκ³  ν•  수 μžˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ˜ 탐색을 톡해 두 λ…Όμ˜μ μ— λŒ€ν•œ 뢄석과 각 λ…Όμ˜μ μ— λŒ€μ‘ν•˜λŠ” μ§€ν‘œλ₯Ό ν–₯μƒμ‹œν‚¬ 수 μžˆλŠ” ν•™μŠ΅ 방법을 μ œμ•ˆν•œλ‹€. μš°μ„ , κ°•κ±΄μ„±μ˜ 이해와 ν–₯상 을 μœ„ν•΄ μž…λ ₯값에 λŒ€ν•œ μ†μ‹€ν•¨μˆ˜λ₯Ό λΆ„μ„ν•œλ‹€. μ λŒ€μ  곡격은 μž…λ ₯값에 λŒ€ν•΄ μ†μ‹€ν•¨μˆ˜λ₯Ό μ΅œλŒ€ν™”ν•˜λŠ” 섭동을 μƒμ„±ν•˜λ―€λ‘œ, 비정상적인 섭동이 더해진 μž…λ ₯값에 λŒ€ν•΄μ„œ μ†μ‹€ν•¨μˆ˜ λ₯Ό μ΅œμ†Œν™”ν•  수 μžˆλŠ” λ°©μ–΄ 방법에 λŒ€ν•΄ μ—°κ΅¬ν•œλ‹€. κ·Έ μ‹œμž‘μœΌλ‘œ, μ λŒ€μ  λ°©μ–΄ κΈ°λ²•μ˜ ν•˜λ‚˜μΈ 단일 단계 μ λŒ€μ  ν•™μŠ΅μ—μ„œ μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ΄ μ‰½κ²Œ 뒀틀릴 수 μžˆμŒμ„ λ°ν˜€λ‚Έλ‹€. μ œμ•ˆλœ μ—°κ΅¬μ—μ„œ λ’€ν‹€λ¦° μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ΄ λͺ¨λΈμ˜ 강건성을 μ‹¬κ°ν•˜κ²Œ 손상할 수 μžˆμŒμ„ 보이고, 이λ₯Ό 기반으둜 λ§€λ„λŸ¬μš΄ μ†μ‹€ν•¨μˆ˜λ₯Ό κ°–λŠ” κ²ƒμ˜ μ€‘μš”μ„±μ„ 증λͺ…ν•œλ‹€. μ†μ‹€ν•¨μˆ˜ ν‰λ©΄μ˜ νŠΉμ„±μ„ 기반으둜 λ‹€μ–‘ν•œ μ˜μ—­μ—μ„œμ˜ μ λŒ€μ  곡격과 λ°©μ–΄ 기법에 λŒ€ν•œ 뢄석과 μ„±λŠ₯ ν–₯상을 μ—°κ΅¬ν•œλ‹€. 첫 번째둜, κ΅¬μ‘°λ‚˜ κ°€μ€‘μΉ˜κ°€ μƒμ΄ν•œ λͺ¨λΈμ—μ„œ μ λŒ€μ  예제λ₯Ό 생 μ„±ν•˜μ—¬ λŒ€μƒ λͺ¨λΈλ‘œ κ³΅κ²©ν•˜λŠ” 전이 곡격의 μ„ΈκΈ°κ°€ μ†μ‹€ν•¨μˆ˜ν‰λ©΄κ³Ό 깊이 관련이 μžˆμŒμ„ 증λͺ…ν•œλ‹€. 이λ₯Ό 기반으둜 κ°•λ ₯ν•œ μ λŒ€μ  μ†Œλ¦¬ 예제λ₯Ό μƒμ„±ν•˜κ³ , λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μ‹ λ’°ν•  수 μžˆλŠ” 강건성 μˆ˜μ€€μ„ μ œμ•ˆν•œλ‹€. 이어 μ λŒ€μ  ν•™μŠ΅μ˜ νŠΉμ§•κ³Ό ν•™μŠ΅λœ λͺ¨λΈμ˜ μ†μ‹€ν•¨μˆ˜ν‰ 면을 νƒμƒ‰ν•œλ‹€. μž…λ ₯값에 λŒ€ν•œ μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ„ λΆ€λ“œλŸ½κ²Œ λ§Œλ“€κΈ° μœ„ν•˜μ—¬, μ λŒ€μ  ν•™μŠ΅μ— 쀑앙점을 κ³ λ €ν•œ μ†μ‹€ν•¨μˆ˜λ₯Ό λ„μž…ν•˜μ—¬ λͺ¨λΈμ˜ 강건성을 높인닀. λ‹€μŒμœΌλ‘œ, μΌλ°˜ν™”μ˜ 이해와 ν–₯상을 μœ„ν•΄ κ°€μ€‘μΉ˜μ— λŒ€ν•œ μ†μ‹€ν•¨μˆ˜λ₯Ό λΆ„μ„ν•œλ‹€. 졜근 일련의 μ—°κ΅¬μ—μ„œλŠ” λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μΌλ°˜ν™” μ„±λŠ₯은 μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ˜ 평평함과 κΈ΄λ°€ν•˜κ²Œ μ—°κ²°λ˜μ–΄ 있음이 증λͺ…λœ λ°” μžˆλ‹€. 이λ₯Ό 기반으둜 μ œμ•ˆλœ 첨예 기반 ν•™μŠ΅μ€ μ²¨μ˜ˆν•œ 졜적점 을 κΈ°ν”Όν•˜κ³  ν‰ν‰ν•œ μ΅œμ μ μ„ 찾음으둜써 높은 μΌλ°˜ν™” μ„±λŠ₯을 λ‹¬μ„±ν•œλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 첨예 기반 ν•™μŠ΅ λ°©λ²•μ˜ μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ— λŒ€ν•œ 뢄석을 μ§„ν–‰ν•œλ‹€. μš°μ„  첨예 기반 ν•™μŠ΅μ΄ μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ— μ•ˆμž₯점이 μ‘΄μž¬ν•  경우 수렴이 λΆˆμ•ˆμ •ν•¨μ„ λ°νžŒλ‹€. λΆˆμ•ˆμ •ν•œ 수렴 λ•Œ 문에 졜적점이 μ•„λ‹Œ μ•ˆμž₯점에 κ°‡νžˆλŠ” κ²½μš°κ°€ λ°œμƒν•˜λ©°, μ΄λŠ” 첨예 기반 ν•™μŠ΅μ˜ μ„±λŠ₯을 저해함을 보인닀. λΆˆμ•ˆμ •ν•œ μˆ˜λ ΄μ„ κ°œμ„ ν•˜κ³  더 높은 μΌλ°˜ν™” μ„±λŠ₯을 λ‹¬μ„±ν•˜κΈ° μœ„ν•΄, κ°€μ€‘μΉ˜ κ³΅κ°„μ—μ„œμ˜ 섭동을 κ΅¬ν•˜λŠ” λ‹¨κ³„μ—μ„œ λ„μΆœλ˜λŠ” λͺ¨λ“  μ€‘μ•™μ μ˜ 기울기 정보λ₯Ό ν™œμš©ν•˜λŠ” 방법을 μ œμ•ˆν•œλ‹€. λ³Έ μ—°κ΅¬λŠ” μ†μ‹€ν•¨μˆ˜ν‰λ©΄μ— λŒ€ν•œ 탐색과 고찰을 λ°”νƒ•μœΌλ‘œ 강건성과 μΌλ°˜ν™”μ— λŒ€ν•œ 더 κΉŠμ€ 이해λ₯Ό μ œμ‹œν•˜κ³ , 이λ₯Ό ν†΅ν•΄μ„œ 각 μ§€ν‘œμ˜ ν–₯상을 μœ„ν•œ μƒˆλ‘œμš΄ μ λŒ€μ  곡격 방법, μ λŒ€μ  λ°©μ–΄ 방법, 첨예 기반 ν•™μŠ΅ 방법을 μ œμ•ˆν•˜μ˜€λ‹€. 연ꡬ κ²°κ³ΌλŠ” ν–₯ν›„ λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ μ‹€ν˜„μ„ μœ„ν•œ μΆ”ν›„ 연ꡬ에 ν™•μž₯μ„± μžˆλŠ” λͺ¨λΈμ΄λ©°, 강건성과 μΌλ°˜ν™”μ— μžˆμ–΄ μ†μ‹€ν•¨μˆ˜ν‰ 면에 λŒ€ν•œ 심도 μžˆλŠ” 뢄석이 μ„ ν–‰λ˜μ–΄μ•Ό ν•œλ‹€λŠ” ν•¨μ˜μ μ„ μ œκ³΅ν•œλ‹€.Recent advances in deep learning have demonstrated significant performance improvements in various domains, such as computer vision and speech recognition, yielding numerous industrial applications. Compared to other machine learning models, deep learning models have a large number of parameters and this brings near zero training loss that was previously considered impossible. To train these overparameterized models, we generally minimize the loss on training data, which we call empirical risk minimization (ERM). However, recent studies have demonstrated that these deep learning models trained by ERM may suffer from two major problems: adversarial vulnerability and poor generalization. Adversarial vulnerability is an intriguing property of deep learning models that makes them susceptible to adversarial attacks that create malicious examples with slight modifications (Szegedy et al., 2013; Goodfellow et al., 2014). Prior studies have also confirmed that there exist the potential risks of deep learning models in real-world applications (Papernot et al., 2017; Kurakin et al., 2016). Adversarial attacks entail severe hazards in real-world applications, e.g., causing autonomous vehicle accidents by manipulating decision-making or extracting private information by circumventing voice authorization. Thus, to prevent these malicious cases arisen from the existence of adversarial attacks, many researchers proposed various methods to enhance the robustness of deep learning models against adversarial attacks. Poor generalization, another issue with current deep learning models, is a large discrepancy between training accuracy and test accuracy. In other words, existing methods can successfully minimize loss on train datasets, but this does not guarantee high performance on test datasets (Ishida et al., 2020; Foret et al., 2020). To achieve an ideal performance over various domains, improving the generalization of neural networks has been a core challenge in deep learning. In this dissertation, focusing on the fact that both robustness and generalization are heavily related to the loss landscape, we aim to gain a deeper understanding of adversarial robustness and generalization performance of deep learning models by analyzing their loss landscape. First, we investigate the adversarial robustness with respect to its loss landscape. Through analyzing the loss landscape of adversarially trained models, we discover that the distortion of the loss landscape can occur, resulting in poor adversarial robustness. Based on this observation, we extend the loss landscape analysis to adversarial attacks and defenses to improve the adversarial robustness of deep learning models. We further analyze sharpness-aware minimization with its loss landscape and reveal that there exists a convergence instability problem due to its inherent algorithm. Specifically, whether the loss landscape in the parameter space has a saddle point can heavily affect the optimization and its generalization performance. Given this phenomenon, we investigate the loss landscape with respect to perturbation in the parameter space and improve generalization performance by exploring a wider loss landscape.Chapter 1 Introduction 1 1.1 Motivation of the Dissertation 1 1.2 Aims of the Dissertation 4 1.3 Organization of the Dissertation 6 Chapter 2 Adversarial Robustness and Loss Landscape 8 2.1 Chapter Overview 8 2.2 Preliminaries 11 2.2.1 Adversarial Robustness 11 2.2.2 Single-step and Multi-step Adversarial Attack 12 2.2.3 Catastrophic Overfitting 13 2.3 Methodology 15 2.3.1 Revisiting Catastrophic Overfitting 15 2.3.2 Stable Single-Step Adversarial Training 19 2.4 Experiments . 24 2.4.1 Experimental Setup 24 2.4.2 Visualizing Decision Boundary Distortion 27 2.4.3 Distortion and Nonlinearity of the Loss Function 31 2.4.4 Adversarial Robustness 33 2.5 Chapter Summary 35 Chapter 3 Geometry-Aware Adversarial Attack and Defense 36 3.1 Chapter Overview 36 3.2 Preliminaries 37 3.2.1 Adversarial Attack 37 3.2.2 Adversarial Defense 41 3.3 Methodology 43 3.3.1 Transferable Adversarial Examples 43 3.3.2 Improved Adversarial Training 55 3.4 Experiments . 68 3.4.1 Transferability 68 3.4.2 Adversarial Robustness 74 3.5 Chapter Summary 85 Chapter 4 Generalization and Loss Landscape 86 4.1 Chapter Overview 86 4.2 Preliminaries 89 4.2.1 Generliazation and Sharpness-Aware Minimization 89 4.2.2 Escaping Saddle Points 91 4.3 Methodology 92 4.3.1 Asymptotic Behavior of SAM Dynamics 92 4.3.2 Saddle Point Becomes Attractor in SAM Dynamics 97 4.4 Experiments . 101 4.4.1 Stochastic Behavior of SAM Dynamics 101 4.4.2 Convergence Instability and Training Tricks 107 4.5 Chapter Summary 111 Chapter 5 Sharpness-Aware Minimization with Multi-Ascent 113 5.1 Chapter Overview 113 5.2 Preliminaries 115 5.3 Methodology 118 5.3.1 Revisiting Number of Ascent Steps in SAM 118 5.3.2 Multi-ascent Sharpness-Aware Minimization 122 5.4 Experiments . 125 5.4.1 Experimental Setup 125 5.4.2 Generalization Performance 126 5.4.3 Escaping Local Minima 127 5.5 Chapter Summary 128 Chapter 6 Conclusion 129 6.1 Contributions 129 6.2 Future Work 130 Bibliography 131 ꡭ문초둝 171λ°•
    • …
    corecore