42,719 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Self-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge Evolution
Cloud controllers aim at responding to application demands by automatically
scaling the compute resources at runtime to meet performance guarantees and
minimize resource costs. Existing cloud controllers often resort to scaling
strategies that are codified as a set of adaptation rules. However, for a cloud
provider, applications running on top of the cloud infrastructure are more or
less black-boxes, making it difficult at design time to define optimal or
pre-emptive adaptation rules. Thus, the burden of taking adaptation decisions
often is delegated to the cloud application. Yet, in most cases, application
developers in turn have limited knowledge of the cloud infrastructure. In this
paper, we propose learning adaptation rules during runtime. To this end, we
introduce FQL4KE, a self-learning fuzzy cloud controller. In particular, FQL4KE
learns and modifies fuzzy rules at runtime. The benefit is that for designing
cloud controllers, we do not have to rely solely on precise design-time
knowledge, which may be difficult to acquire. FQL4KE empowers users to specify
cloud controllers by simply adjusting weights representing priorities in system
goals instead of specifying complex adaptation rules. The applicability of
FQL4KE has been experimentally assessed as part of the cloud application
framework ElasticBench. The experimental results indicate that FQL4KE
outperforms our previously developed fuzzy controller without learning
mechanisms and the native Azure auto-scaling
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
A memetic particle swarm optimisation algorithm for dynamic multi-modal optimisation problems
Copyright @ 2011 Taylor & Francis.Many real-world optimisation problems are both dynamic and multi-modal, which require an optimisation algorithm not only to find as many optima under a specific environment as possible, but also to track their moving trajectory over dynamic environments. To address this requirement, this article investigates a memetic computing approach based on particle swarm optimisation for dynamic multi-modal optimisation problems (DMMOPs). Within the framework of the proposed algorithm, a new speciation method is employed to locate and track multiple peaks and an adaptive local search method is also hybridised to accelerate the exploitation of species generated by the speciation method. In addition, a memory-based re-initialisation scheme is introduced into the proposed algorithm in order to further enhance its performance in dynamic multi-modal environments. Based on the moving peaks benchmark problems, experiments are carried out to investigate the performance of the proposed algorithm in comparison with several state-of-the-art algorithms taken from the literature. The experimental results show the efficiency of the proposed algorithm for DMMOPs.This work was supported by the Key Program of National Natural Science Foundation (NNSF) of China under Grant no. 70931001, the Funds for Creative Research Groups of China under Grant no. 71021061, the National Natural Science Foundation (NNSF) of China under Grant 71001018, Grant no. 61004121 and Grant no. 70801012 and the Fundamental Research Funds for the Central Universities Grant no. N090404020, the Engineering and Physical Sciences Research Council (EPSRC) of UK under Grant no. EP/E060722/01 and Grant EP/E060722/02, and the Hong Kong Polytechnic University under Grant G-YH60
Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning
Fine tuning distributed systems is considered to be a craftsmanship, relying
on intuition and experience. This becomes even more challenging when the
systems need to react in near real time, as streaming engines have to do to
maintain pre-agreed service quality metrics. In this article, we present an
automated approach that builds on a combination of supervised and reinforcement
learning methods to recommend the most appropriate lever configurations based
on previous load. With this, streaming engines can be automatically tuned
without requiring a human to determine the right way and proper time to deploy
them. This opens the door to new configurations that are not being applied
today since the complexity of managing these systems has surpassed the
abilities of human experts. We show how reinforcement learning systems can find
substantially better configurations in less time than their human counterparts
and adapt to changing workloads
- …