117 research outputs found
Enhancing Evolutionary Conversion Rate Optimization via Multi-armed Bandit Algorithms
Conversion rate optimization means designing web interfaces such that more
visitors perform a desired action (such as register or purchase) on the site.
One promising approach, implemented in Sentient Ascend, is to optimize the
design using evolutionary algorithms, evaluating each candidate design online
with actual visitors. Because such evaluations are costly and noisy, several
challenges emerge: How can available visitor traffic be used most efficiently?
How can good solutions be identified most reliably? How can a high conversion
rate be maintained during optimization? This paper proposes a new technique to
address these issues. Traffic is allocated to candidate solutions using a
multi-armed bandit algorithm, using more traffic on those evaluations that are
most useful. In a best-arm identification mode, the best candidate can be
identified reliably at the end of evolution, and in a campaign mode, the
overall conversion rate can be optimized throughout the entire evolution
process. Multi-armed bandit algorithms thus improve performance and reliability
of machine discovery in noisy real-world environments.Comment: The Thirty-First Innovative Applications of Artificial Intelligence
Conferenc
Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation
This paper proposes a new algorithm, referred to as GMAB, that combines
concepts from the reinforcement learning domain of multi-armed bandits and
random search strategies from the domain of genetic algorithms to solve
discrete stochastic optimization problems via simulation. In particular, the
focus is on noisy large-scale problems, which often involve a multitude of
dimensions as well as multiple local optima. Our aim is to combine the property
of multi-armed bandits to cope with volatile simulation observations with the
ability of genetic algorithms to handle high-dimensional solution spaces
accompanied by an enormous number of feasible solutions. For this purpose, a
multi-armed bandit framework serves as a foundation, where each observed
simulation is incorporated into the memory of GMAB. Based on this memory,
genetic operators guide the search, as they provide powerful tools for
exploration as well as exploitation. The empirical results demonstrate that
GMAB achieves superior performance compared to benchmark algorithms from the
literature in a large variety of test problems. In all experiments, GMAB
required considerably fewer simulations to achieve similar or (far) better
solutions than those generated by existing methods. At the same time, GMAB's
overhead with regard to the required runtime is extremely small due to the
suggested tree-based implementation of its memory. Furthermore, we prove its
convergence to the set of global optima as the simulation effort goes to
infinity
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints
We propose a novel master-slave architecture to solve the top-
combinatorial multi-armed bandits problem with non-linear bandit feedback and
diversity constraints, which, to the best of our knowledge, is the first
combinatorial bandits setting considering diversity constraints under bandit
feedback. Specifically, to efficiently explore the combinatorial and
constrained action space, we introduce six slave models with distinguished
merits to generate diversified samples well balancing rewards and constraints
as well as efficiency. Moreover, we propose teacher learning based optimization
and the policy co-training technique to boost the performance of the multiple
slave models. The master model then collects the elite samples provided by the
slave models and selects the best sample estimated by a neural contextual
UCB-based network to make a decision with a trade-off between exploration and
exploitation. Thanks to the elaborate design of slave models, the co-training
mechanism among slave models, and the novel interactions between the master and
slave models, our approach significantly surpasses existing state-of-the-art
algorithms in both synthetic and real datasets for recommendation tasks. The
code is available at:
\url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.Comment: IEEE Transactions on Neural Networks and Learning System
On Experimentation in Software-Intensive Systems
Context: Delivering software that has value to customers is a primary concern of every software company. Prevalent in web-facing companies, controlled experiments are used to validate and deliver value in incremental deployments. At the same that web-facing companies are aiming to automate and reduce the cost of each experiment iteration, embedded systems companies are starting to adopt experimentation practices and leverage their activities on the automation developments made in the online domain. Objective: This thesis has two main objectives. The first objective is to analyze how software companies can run and optimize their systems through automated experiments. This objective is investigated from the perspectives of the software architecture, the algorithms for the experiment execution and the experimentation process. The second objective is to analyze how non web-facing companies can adopt experimentation as part of their development process to validate and deliver value to their customers continuously. This objective is investigated from the perspectives of the software development process and focuses on the experimentation aspects that are distinct from web-facing companies. Method: To achieve these objectives, we conducted research in close collaboration with industry and used a combination of different empirical research methods: case studies, literature reviews, simulations, and empirical evaluations. Results: This thesis provides six main results. First, it proposes an architecture framework for automated experimentation that can be used with different types of experimental designs in both embedded systems and web-facing systems. Second, it proposes a new experimentation process to capture the details of a trustworthy experimentation process that can be used as the basis for an automated experimentation process. Third, it identifies the restrictions and pitfalls of different multi-armed bandit algorithms for automating experiments in industry. This thesis also proposes a set of guidelines to help practitioners select a technique that minimizes the occurrence of these pitfalls. Fourth, it proposes statistical models to analyze optimization algorithms that can be used in automated experimentation. Fifth, it identifies the key challenges faced by embedded systems companies when adopting controlled experimentation, and we propose a set of strategies to address these challenges. Sixth, it identifies experimentation techniques and proposes a new continuous experimentation model for mission-critical and business-to-business. Conclusion: The results presented in this thesis indicate that the trustworthiness in the experimentation process and the selection of algorithms still need to be addressed before automated experimentation can be used at scale in industry. The embedded systems industry faces challenges in adopting experimentation as part of its development process. In part, this is due to the low number of users and devices that can be used in experiments and the diversity of the required experimental designs for each new situation. This limitation increases both the complexity of the experimentation process and the number of techniques used to address this constraint
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities
Evolutionary algorithms (EA), a class of stochastic search methods based on
the principles of natural evolution, have received widespread acclaim for their
exceptional performance in various real-world optimization problems. While
researchers worldwide have proposed a wide variety of EAs, certain limitations
remain, such as slow convergence speed and poor generalization capabilities.
Consequently, numerous scholars actively explore improvements to algorithmic
structures, operators, search patterns, etc., to enhance their optimization
performance. Reinforcement learning (RL) integrated as a component in the EA
framework has demonstrated superior performance in recent years. This paper
presents a comprehensive survey on integrating reinforcement learning into the
evolutionary algorithm, referred to as reinforcement learning-assisted
evolutionary algorithm (RL-EA). We begin with the conceptual outlines of
reinforcement learning and the evolutionary algorithm. We then provide a
taxonomy of RL-EA. Subsequently, we discuss the RL-EA integration method, the
RL-assisted strategy adopted by RL-EA, and its applications according to the
existing literature. The RL-assisted procedure is divided according to the
implemented functions including solution generation, learnable objective
function, algorithm/operator/sub-population selection, parameter adaptation,
and other strategies. Finally, we analyze potential directions for future
research. This survey serves as a rich resource for researchers interested in
RL-EA as it overviews the current state-of-the-art and highlights the
associated challenges. By leveraging this survey, readers can swiftly gain
insights into RL-EA to develop efficient algorithms, thereby fostering further
advancements in this emerging field.Comment: 26 pages, 16 figure
Towards Automated Experiments in Software Intensive Systems
Context: Delivering software that has value to customers is a primary concern of every software company. One of the techniques to continuously validate and deliver value in online software systems is the use of controlled experiments. The time cost of each experiment iteration, the increasing growth in the development organization to run experiments and the need for a more automated and systematic approach is leading companies to look for different techniques to automate the experimentation process. Objective: The overall objective of this thesis is to analyze how to automate different types of experiments and how companies can support and optimize their systems through automated experiments. This thesis explores the topic of automated online experiments from the perspectives of the software architecture, the algorithms for the experiment execution and the experimentation process, and focuses on two main application domains: the online and the embedded systems domain. Method: To achieve the objective, we conducted this research in close collaboration with industry using a combination of different empirical research methods: case studies, literature reviews, simulations and empirical evaluations. Results and conclusions: This thesis provides five main results. First, we propose an architecture framework for automated experimentation that can be used with different types of experimental designs in both embedded systems and web-facing systems. Second, we identify the key challenges faced by embedded systems companies when adopting controlled experimentation and we propose a set of strategies to address these challenges. Third, we develop a new algorithm for online experiments. Fourth, we identify restrictions and pitfalls of different algorithms for automating experiments in industry and we propose a set of guidelines to help practitioners select a technique that minimizes the occurrence of these pitfalls. Fifth, we propose a new experimentation process to capture the details of a trustworthy experimentation process that can be used as basis for an automated experimentation process. Future work: In future work, we plan to investigate how embedded systems can incorporate experiments in their development process without compromising existing real-time and safety requirements. We also plan to analyze the impact and costs of automating the different parts of the experimentation process
- …