Search CORE

1,002 research outputs found

Recommended from our members

Local search: A guide for the information retrieval practitioner

Author: Abramson
Althofer
Andrew MacFarlane
Andrew Tuson
Baeck
Battiti
Boughanem
Cartwright
Chen
Chen
Chen
Cleverdon
Collins
Cordon
Cordon
Corne
Darwin
Dorigo
Downsland
Dueck
Fan
Fan
Fan
Fan
Feo
Fernandez-Villacanas Martin
Fogel
Fogel
Frakes
Frakes
Garey
Glover
Glover
Glover
Goldberg
Hajek
Harman
Harman
Harman
Harman
Hasan
Hawking
Hertz
Hertz
Holland
Hooker
Horng
Kekäläinen
Kirkpatrick
Koza
Kuflik
Lam
Lopez-Pujalte
Lopez-Pujalte
Lopez-Pujalte
Luke
Lundy
Martin-Bautisata
Masters
Michalewicz
Mock
Mock
Newell
Ogbu
Oliveira
Osman
Osman
Osman
Osman
Papadimitriou
Pohlheim
Rechenburg
Reeves
Reeves
Robertson
Sebastiani
Semet
Sinclair
Smith
Sparck Jones
Stefik
Tamine
Thangiah
Trotman
Van Laarhoven
Vrajitoru
Wartik
Yang
Zweben
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR

City Research Online

Crossref

An Empirical Study of Non-binary Genetic Algorithm-based Neural Approaches for Classification

Author: Pendharkar Parag
Rodger James
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/1999
Field of study

AIS Electronic Library (AISeL)

Reinforcement learning in continuous state- and action-space

Author: Nichols B.D.
Nichols B.D.
Publication venue
Publication date: 01/01/2014
Field of study

Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values. This can be overcome with the use of function approximation techniques with generalisation capability, such as artificial neural networks, to store the value function. When this is applied we can select the optimal action by comparing the values of each possible action; however, when the action-space is continuous this is not possible. In this thesis we investigate methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques. Although it has been stated in the literature that gradient-ascent methods can be applied to the action selection [47], it is also stated that solving this problem would be infeasible, and therefore, is claimed that it is necessary to utilise a second artificial neural network to approximate the policy function [21, 55]. The major contributions of this thesis include the investigation of the applicability of action selection by numerical optimization methods, including gradient-ascent along with other derivative-based and derivative-free numerical optimization methods,and the proposal of two novel algorithms which are based on the application of two alternative action selection methods: NM-SARSA [40] and NelderMead-SARSA. We empirically compare the proposed methods to state-of-the-art methods from the literature on three continuous state- and action-space control benchmark problems from the literature: minimum-time full swing-up of the Acrobot; Cart-Pole balancing problem; and a double pole variant. We also present novel results from the application of the existing direct policy search method genetic programming to the Acrobot benchmark problem [12, 14]

WestminsterResearch

MaxMin-L2-SVC-NCH: A New Method to Train Support Vector Classifier with the Selection of Model's Parameters

Author: Chen Ziyang
Luo Linkai
Peng Hong
Wang Yiding
Yang Qiaoling
Publication venue
Publication date: 14/07/2023
Field of study

The selection of model's parameters plays an important role in the application of support vector classification (SVC). The commonly used method of selecting model's parameters is the k-fold cross validation with grid search (CV). It is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new method is proposed to train SVC with the selection of model's parameters. Firstly, training SVC with the selection of model's parameters is modeled as a minimax optimization problem (MaxMin-L2-SVC-NCH), in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal model's parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is abandoned. A gradient-based algorithm is then proposed to solve MaxMin-L2-SVC-NCH, in which L2-SVC-NCH is solved by a projected gradient algorithm (PGA) while the maximization problem is solved by a gradient ascent algorithm with dynamic learning rate. To demonstrate the advantages of the PGA in solving L2-SVC-NCH, we carry out a comparison of the PGA and the famous sequential minimal optimization (SMO) algorithm after a SMO algorithm and some KKT conditions for L2-SVC-NCH are provided. It is revealed that the SMO algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility. The comparative experiments between MaxMin-L2-SVC-NCH and the classical parameter selection models on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained and the test accuracy is not lost to the classical models. It indicates that MaxMin-L2-SVC-NCH performs better than the other models. We strongly recommend MaxMin-L2-SVC-NCH as a preferred model for SVC task

arXiv.org e-Print Archive

Variance Reduction for Faster Non-Convex Optimization

Author: Allen-Zhu Zeyuan
Hazan Elad
Publication venue
Publication date: 01/01/2016
Field of study

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in

O(1/\varepsilon)

iterations for smooth objectives, and stochastic gradient descent that converges in

O(1/\varepsilon^2)

iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an

O(1/\varepsilon)

rate, and is faster than full gradient descent by

\Omega(n^{1/3})

. We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.Comment: polished writin

arXiv.org e-Print Archive

Princeton University Open Access Repository

손실함수 탐색을 통한 딥러닝 모델의 강건성과 일반화 향상

Author: Hoki Kim
Publication venue: 서울대학교 대학원
Publication date: 01/02/2023
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 산업공학과, 2023. 2. 이재욱.딥러닝은 다양한 분야에서 뛰어난 성능향상을 보이며, 음성 인식, 자율주행 및 의료 산업 등 많은 분야에 활용되고 있다. 딥러닝 모델은 수많은 가중치를 기반으로, 주어진 학습 데이터에 대한 손실함수를 줄이도록 학습된다. 그러나, 최근 학습 데이터에 대한 맹목적인 손실함수의 최소화는 크게 두 가지의 논의점이 있음이 밝혀졌다. 첫 번째 논의점은 딥러닝 모델의 강건성이다. 강건성이란 딥러닝 모델의 적대적 공격에 대한 방어 능력을 말한다. 적대적 공격은 학습된 딥러닝 모델의 가중치와 기 울기 정보 등을 활용하여 비정상적인 데이터를 만들어내는 방법으로, 딥러닝 모델의 성능을 현저하게 저하시킨다. 현재까지 밝혀진 바로는 아주 작은 크기의 섭동도 비정상 데이터를 생성하기에 충분하여, 사람에게는 정상 데이터로 인식되나 딥러닝 모델은 치 명적으로 오작동하는 적대적 예제를 쉽게 만들 수 있다. 따라서 딥러닝 모델의 안전한 상용화를 위해 강건성은 필수적으로 연구되어야 할 요소이다. 두 번째 논의점은 딥러닝 모델의 일반화이다. 일반화란 딥러닝 모델의 학습 데이터 에 대한 성능과 평가 데이터에 대한 성능의 차이를 의미한다. 차이가 작을수록 일반화 성능이 높으며, 이는 곧 딥러닝 모델의 높은 상용화 가능성을 내포한다. 그러나 학습 데이터에 대한 손실함수만을 줄이는 학습 방법은 학습 데이터에 대한 과적합 현상을 불러오며, 이는 곧 평가 데이터에 대한 성능 감소로 이어짐이 여러 선행 연구에 의해 밝혀진 바 있다. 딥러닝 모델의 성능 향상은 학습 데이터가 아닌 평가 데이터에 대해 판단되므로, 일반화 성능의 달성은 모든 딥러닝 모델의 궁극적인 목표라고 할 수 있다. 본 연구에서는 손실함수평면의 탐색을 통해 두 논의점에 대한 분석과 각 논의점에 대응하는 지표를 향상시킬 수 있는 학습 방법을 제안한다. 우선, 강건성의 이해와 향상 을 위해 입력값에 대한 손실함수를 분석한다. 적대적 공격은 입력값에 대해 손실함수를 최대화하는 섭동을 생성하므로, 비정상적인 섭동이 더해진 입력값에 대해서 손실함수 를 최소화할 수 있는 방어 방법에 대해 연구한다. 그 시작으로, 적대적 방어 기법의 하나인 단일 단계 적대적 학습에서 손실함수평면이 쉽게 뒤틀릴 수 있음을 밝혀낸다. 제안된 연구에서 뒤틀린 손실함수평면이 모델의 강건성을 심각하게 손상할 수 있음을 보이고, 이를 기반으로 매끄러운 손실함수를 갖는 것의 중요성을 증명한다. 손실함수 평면의 특성을 기반으로 다양한 영역에서의 적대적 공격과 방어 기법에 대한 분석과 성능 향상을 연구한다. 첫 번째로, 구조나 가중치가 상이한 모델에서 적대적 예제를 생 성하여 대상 모델로 공격하는 전이 공격의 세기가 손실함수평면과 깊이 관련이 있음을 증명한다. 이를 기반으로 강력한 적대적 소리 예제를 생성하고, 딥러닝 모델의 신뢰할 수 있는 강건성 수준을 제안한다. 이어 적대적 학습의 특징과 학습된 모델의 손실함수평 면을 탐색한다. 입력값에 대한 손실함수평면을 부드럽게 만들기 위하여, 적대적 학습에 중앙점을 고려한 손실함수를 도입하여 모델의 강건성을 높인다. 다음으로, 일반화의 이해와 향상을 위해 가중치에 대한 손실함수를 분석한다. 최근 일련의 연구에서는 딥러닝 모델의 일반화 성능은 손실함수평면의 평평함과 긴밀하게 연결되어 있음이 증명된 바 있다. 이를 기반으로 제안된 첨예 기반 학습은 첨예한 최적점 을 기피하고 평평한 최적점을 찾음으로써 높은 일반화 성능을 달성한다. 본 연구에서는 첨예 기반 학습 방법의 손실함수평면에 대한 분석을 진행한다. 우선 첨예 기반 학습이 손실함수평면에 안장점이 존재할 경우 수렴이 불안정함을 밝힌다. 불안정한 수렴 때 문에 최적점이 아닌 안장점에 갇히는 경우가 발생하며, 이는 첨예 기반 학습의 성능을 저해함을 보인다. 불안정한 수렴을 개선하고 더 높은 일반화 성능을 달성하기 위해, 가중치 공간에서의 섭동을 구하는 단계에서 도출되는 모든 중앙점의 기울기 정보를 활용하는 방법을 제안한다. 본 연구는 손실함수평면에 대한 탐색과 고찰을 바탕으로 강건성과 일반화에 대한 더 깊은 이해를 제시하고, 이를 통해서 각 지표의 향상을 위한 새로운 적대적 공격 방법, 적대적 방어 방법, 첨예 기반 학습 방법을 제안하였다. 연구 결과는 향후 딥러닝 모델의 실현을 위한 추후 연구에 확장성 있는 모델이며, 강건성과 일반화에 있어 손실함수평 면에 대한 심도 있는 분석이 선행되어야 한다는 함의점을 제공한다.Recent advances in deep learning have demonstrated significant performance improvements in various domains, such as computer vision and speech recognition, yielding numerous industrial applications. Compared to other machine learning models, deep learning models have a large number of parameters and this brings near zero training loss that was previously considered impossible. To train these overparameterized models, we generally minimize the loss on training data, which we call empirical risk minimization (ERM). However, recent studies have demonstrated that these deep learning models trained by ERM may suffer from two major problems: adversarial vulnerability and poor generalization. Adversarial vulnerability is an intriguing property of deep learning models that makes them susceptible to adversarial attacks that create malicious examples with slight modifications (Szegedy et al., 2013; Goodfellow et al., 2014). Prior studies have also confirmed that there exist the potential risks of deep learning models in real-world applications (Papernot et al., 2017; Kurakin et al., 2016). Adversarial attacks entail severe hazards in real-world applications, e.g., causing autonomous vehicle accidents by manipulating decision-making or extracting private information by circumventing voice authorization. Thus, to prevent these malicious cases arisen from the existence of adversarial attacks, many researchers proposed various methods to enhance the robustness of deep learning models against adversarial attacks. Poor generalization, another issue with current deep learning models, is a large discrepancy between training accuracy and test accuracy. In other words, existing methods can successfully minimize loss on train datasets, but this does not guarantee high performance on test datasets (Ishida et al., 2020; Foret et al., 2020). To achieve an ideal performance over various domains, improving the generalization of neural networks has been a core challenge in deep learning. In this dissertation, focusing on the fact that both robustness and generalization are heavily related to the loss landscape, we aim to gain a deeper understanding of adversarial robustness and generalization performance of deep learning models by analyzing their loss landscape. First, we investigate the adversarial robustness with respect to its loss landscape. Through analyzing the loss landscape of adversarially trained models, we discover that the distortion of the loss landscape can occur, resulting in poor adversarial robustness. Based on this observation, we extend the loss landscape analysis to adversarial attacks and defenses to improve the adversarial robustness of deep learning models. We further analyze sharpness-aware minimization with its loss landscape and reveal that there exists a convergence instability problem due to its inherent algorithm. Specifically, whether the loss landscape in the parameter space has a saddle point can heavily affect the optimization and its generalization performance. Given this phenomenon, we investigate the loss landscape with respect to perturbation in the parameter space and improve generalization performance by exploring a wider loss landscape.Chapter 1 Introduction 1 1.1 Motivation of the Dissertation 1 1.2 Aims of the Dissertation 4 1.3 Organization of the Dissertation 6 Chapter 2 Adversarial Robustness and Loss Landscape 8 2.1 Chapter Overview 8 2.2 Preliminaries 11 2.2.1 Adversarial Robustness 11 2.2.2 Single-step and Multi-step Adversarial Attack 12 2.2.3 Catastrophic Overfitting 13 2.3 Methodology 15 2.3.1 Revisiting Catastrophic Overfitting 15 2.3.2 Stable Single-Step Adversarial Training 19 2.4 Experiments . 24 2.4.1 Experimental Setup 24 2.4.2 Visualizing Decision Boundary Distortion 27 2.4.3 Distortion and Nonlinearity of the Loss Function 31 2.4.4 Adversarial Robustness 33 2.5 Chapter Summary 35 Chapter 3 Geometry-Aware Adversarial Attack and Defense 36 3.1 Chapter Overview 36 3.2 Preliminaries 37 3.2.1 Adversarial Attack 37 3.2.2 Adversarial Defense 41 3.3 Methodology 43 3.3.1 Transferable Adversarial Examples 43 3.3.2 Improved Adversarial Training 55 3.4 Experiments . 68 3.4.1 Transferability 68 3.4.2 Adversarial Robustness 74 3.5 Chapter Summary 85 Chapter 4 Generalization and Loss Landscape 86 4.1 Chapter Overview 86 4.2 Preliminaries 89 4.2.1 Generliazation and Sharpness-Aware Minimization 89 4.2.2 Escaping Saddle Points 91 4.3 Methodology 92 4.3.1 Asymptotic Behavior of SAM Dynamics 92 4.3.2 Saddle Point Becomes Attractor in SAM Dynamics 97 4.4 Experiments . 101 4.4.1 Stochastic Behavior of SAM Dynamics 101 4.4.2 Convergence Instability and Training Tricks 107 4.5 Chapter Summary 111 Chapter 5 Sharpness-Aware Minimization with Multi-Ascent 113 5.1 Chapter Overview 113 5.2 Preliminaries 115 5.3 Methodology 118 5.3.1 Revisiting Number of Ascent Steps in SAM 118 5.3.2 Multi-ascent Sharpness-Aware Minimization 122 5.4 Experiments . 125 5.4.1 Experimental Setup 125 5.4.2 Generalization Performance 126 5.4.3 Escaping Local Minima 127 5.5 Chapter Summary 128 Chapter 6 Conclusion 129 6.1 Contributions 129 6.2 Future Work 130 Bibliography 131 국문초록 171박

SNU Open Repository and Archive