39,402 research outputs found
Deep Reinforcement Learning for Adaptive Parameter Control in Differential Evolution for Multi-Objective Optimization
Evolutionary algorithms (EA) are efficient population-based stochastic algorithms for solving optimization problems. The performance of EAs largely depends on the configuration of values of parameters that control their search. Previous works studied how to configure EAs, though, there is a lack of a general approach to effectively tune EAs. To fill this gap, this paper presents a consistent, automated approach for tuning and controlling parameterized search of an EA. For this, we propose a deep reinforcement learning (DRL) based approach called ‘DRL-APC-DE’ for online controlling search parameter values for a multi-objective Differential Evolution algorithm. The proposed method is trained and evaluated on widely adopted multi-objective test problems. The experimental results show that the proposed approach performs competitively to a non-adaptive Differential Evolution algorithm, tuned by grid search on the same range of possible parameter values. Subsequently, the trained algorithms have been applied to unseen multi-objective problems for the adaptive control of parameters. Results show the successful ability of DRL-APC-DE to control parameters for solving these problems, which has the potential to significantly reduce the dependency on parameter tuning for the successful application of EAs
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization
Recently, neural heuristics based on deep reinforcement learning have
exhibited promise in solving multi-objective combinatorial optimization
problems (MOCOPs). However, they are still struggling to achieve high learning
efficiency and solution quality. To tackle this issue, we propose an efficient
meta neural heuristic (EMNH), in which a meta-model is first trained and then
fine-tuned with a few steps to solve corresponding single-objective
subproblems. Specifically, for the training process, a (partial)
architecture-shared multi-task model is leveraged to achieve parallel learning
for the meta-model, so as to speed up the training; meanwhile, a scaled
symmetric sampling method with respect to the weight vectors is designed to
stabilize the training. For the fine-tuning process, an efficient hierarchical
method is proposed to systematically tackle all the subproblems. Experimental
results on the multi-objective traveling salesman problem (MOTSP),
multi-objective capacitated vehicle routing problem (MOCVRP), and
multi-objective knapsack problem (MOKP) show that, EMNH is able to outperform
the state-of-the-art neural heuristics in terms of solution quality and
learning efficiency, and yield competitive solutions to the strong traditional
heuristics while consuming much shorter time.Comment: Accepted at NeurIPS 202
Enhancing Exploration and Safety in Deep Reinforcement Learning
A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
Pareto multi-task deep learning
Neuroevolution has been used to train Deep Neural Networks on reinforcement learning problems. A few attempts have been made to extend it to address either multi-task or multi-objective optimization problems. This research work presents the Multi-Task Multi-Objective Deep Neuroevolution method, a highly parallelizable algorithm that can be adopted for tackling both multi-task and multi-objective problems. In this method prior knowledge on the tasks is used to explicitly define multiple utility functions, which are optimized simultaneously. Experimental results on some Atari 2600 games, a challenging testbed for deep reinforcement learning algorithms, show that a single neural network with a single set of parameters can outperform previous state of the art techniques. In addition to the standard analysis, all results are also evaluated using the Hypervolume indicator and the Kullback-Leibler divergence to get better insights on the underlying training dynamics. The experimental results show that a neural network trained with the proposed evolution strategy can outperform networks individually trained respectively on each of the tasks
Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES
Using generative deep learning models and reinforcement learning together can
effectively generate new molecules with desired properties. By employing a
multi-objective scoring function, thousands of high-scoring molecules can be
generated, making this approach useful for drug discovery and material science.
However, the application of these methods can be hindered by computationally
expensive or time-consuming scoring procedures, particularly when a large
number of function calls are required as feedback in the reinforcement learning
optimization. Here, we propose the use of double-loop reinforcement learning
with simplified molecular line entry system (SMILES) augmentation to improve
the efficiency and speed of the optimization. By adding an inner loop that
augments the generated SMILES strings to non-canonical SMILES for use in
additional reinforcement learning rounds, we can both reuse the scoring
calculations on the molecular level, thereby speeding up the learning process,
as well as offer additional protection against mode collapse. We find that
employing between 5 and 10 augmentation repetitions is optimal for the scoring
functions tested and is further associated with an increased diversity in the
generated compounds, improved reproducibility of the sampling runs and the
generation of molecules of higher similarity to known ligands.Comment: 25 pages and 18 Figures. Supplementary material include
Learning Adaptive Evolutionary Computation for Solving Multi-Objective Optimization Problems
Multi-objective evolutionary algorithms (MOEAs) are widely used to solve multi-objective optimization problems. The algorithms rely on setting appropriate parameters to find good solutions. However, this parameter tuning could be very computationally expensive in solving non-trial (combinatorial) optimization problems. This paper proposes a framework that integrates MOEAs with adaptive parameter control using Deep Reinforcement Learning (DRL). The DRL policy is trained to adaptively set the values that dictate the intensity and probability of mutation for solutions during optimization. We test the proposed approach with a simple benchmark problem and a real-world, complex warehouse design and control problem. The experimental results demonstrate the advantages of our method in terms of solution quality and computation time to reach good solutions. In addition, we show the learned policy is transferable, i.e., the policy trained on a simple benchmark problem can be directly applied to solve the complex warehouse optimization problem, effectively, without the need for retraining
- …