863 research outputs found
Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains
Quality-Diversity optimisation (QD) has proven to yield promising results
across a broad set of applications. However, QD approaches struggle in the
presence of uncertainty in the environment, as it impacts their ability to
quantify the true performance and novelty of solutions. This problem has been
highlighted multiple times independently in previous literature. In this work,
we propose to uniformise the view on this problem through four main
contributions. First, we formalise a common framework for uncertain domains:
the Uncertain QD setting, a special case of QD in which fitness and descriptors
for each solution are no longer fixed values but distribution over possible
values. Second, we propose a new methodology to evaluate Uncertain QD
approaches, relying on a new per-generation sampling budget and a set of
existing and new metrics specifically designed for Uncertain QD. Third, we
propose three new Uncertain QD algorithms: Archive-sampling,
Parallel-Adaptive-sampling and Deep-Grid-sampling. We propose these approaches
taking into account recent advances in the QD community toward the use of
hardware acceleration that enable large numbers of parallel evaluations and
make sampling an affordable approach to uncertainty. Our final and fourth
contribution is to use this new framework and the associated comparison methods
to benchmark existing and novel approaches. We demonstrate once again the
limitation of MAP-Elites in uncertain domains and highlight the performance of
the existing Deep-Grid approach, and of our new algorithms. The goal of this
framework and methods is to become an instrumental benchmark for future works
considering Uncertain QD.Comment: Submitted to Transactions on Evolutionary Computatio
Quality-diversity optimization: a novel branch of stochastic optimization
Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning
Quality-diversity optimization: a novel branch of stochastic optimization
Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning
Don't bet on luck alone: enhancing behavioral reproducibility of quality-diversity solutions in uncertain domains
Quality-Diversity (QD) algorithms are designed to generate collections of high-performing solutions while maximizing their diversity in a given descriptor space. However, in the presence of unpredictable noise, the fitness and descriptor of the same solution can differ significantly from one evaluation to another, leading to uncertainty in the estimation of such values. Given the elitist nature of QD algorithms, they commonly end up with many degenerate solutions in such noisy settings. In this work, we introduce Archive Reproducibility Improvement Algorithm (ARIA); a plug-and-play approach that improves the reproducibility of the solutions present in an archive. We propose it as a separate optimization module, relying on natural evolution strategies, that can be executed on top of any QD algorithm. Our module mutates solutions to (1) optimize their probability of belonging to their niche, and (2) maximize their fitness. The performance of our method is evaluated on various tasks, including a classical optimization problem and two high-dimensional control tasks in simulated robotic environments. We show that our algorithm enhances the quality and descriptor space coverage of any given archive by at least 50%
Empirical analysis of PGA-MAP-Elites for neuroevolution in uncertain domains
Quality-Diversity algorithms, among which MAP-Elites, have emerged as powerful alternatives to performance-only optimisation approaches as they enable generating collections of diverse and high-performing solutions to an optimisation problem. However, they are often limited to low-dimensional search spaces and deterministic environments. The recently introduced Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by pairing the traditional Genetic operator of MAP-Elites with a gradient-based operator inspired by Deep Reinforcement Learning. This new operator guides mutations toward high-performing solutions using policy-gradients. In this work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the benefits of policy-gradients on the performance of the algorithm and the reproducibility of the generated solutions when considering uncertain domains. We first prove that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments, decorrelating the two challenges it tackles. Secondly, we show that in addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments, approaching the reproducibility of solutions found by Quality-Diversity approaches built specifically for uncertain applications. Finally, we propose an ablation and in-depth analysis of the dynamic of the policy-gradients-based variation. We demonstrate that the policy-gradient variation operator is determinant to guarantee the performance of PGA-MAP-Elites but is only essential during the early stage of the process, where it finds high-performing regions of the search space
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning
Training generally capable agents that perform well in unseen dynamic
environments is a long-term goal of robot learning. Quality Diversity
Reinforcement Learning (QD-RL) is an emerging class of reinforcement learning
(RL) algorithms that blend insights from Quality Diversity (QD) and RL to
produce a collection of high performing and behaviorally diverse policies with
respect to a behavioral embedding. Existing QD-RL approaches have thus far
taken advantage of sample-efficient off-policy RL algorithms. However, recent
advances in high-throughput, massively parallelized robotic simulators have
opened the door for algorithms that can take advantage of such parallelism, and
it is unclear how to scale existing off-policy QD-RL methods to these new
data-rich regimes. In this work, we take the first steps to combine on-policy
RL methods, specifically Proximal Policy Optimization (PPO), that can leverage
massive parallelism, with QD, and propose a new QD-RL method with these
high-throughput simulators and on-policy training in mind. Our proposed
Proximal Policy Gradient Arborescence (PPGA) algorithm yields a 4x improvement
over baselines on the challenging humanoid domain.Comment: Submitted to Neurips 202
A Framework for Automatic Behavior Generation in Multi-Function Swarms
Multi-function swarms are swarms that solve multiple tasks at once. For
example, a quadcopter swarm could be tasked with exploring an area of interest
while simultaneously functioning as ad-hoc relays. With this type of
multi-function comes the challenge of handling potentially conflicting
requirements simultaneously. Using the Quality-Diversity algorithm MAP-elites
in combination with a suitable controller structure, a framework for automatic
behavior generation in multi-function swarms is proposed. The framework is
tested on a scenario with three simultaneous tasks: exploration, communication
network creation and geolocation of RF emitters. A repertoire is evolved,
consisting of a wide range of controllers, or behavior primitives, with
different characteristics and trade-offs in the different tasks. This
repertoire would enable the swarm to transition between behavior trade-offs
online, according to the situational requirements. Furthermore, the effect of
noise on the behavior characteristics in MAP-elites is investigated. A moderate
number of re-evaluations is found to increase the robustness while keeping the
computational requirements relatively low. A few selected controllers are
examined, and the dynamics of transitioning between these controllers are
explored. Finally, the study develops a methodology for analyzing the makeup of
the resulting controllers. This is done through a parameter variation study
where the importance of individual inputs to the swarm controllers is assessed
and analyzed
- …