8,359 research outputs found
Fuzzy State Aggregation and Policy Hill Climbing for Stochastic Environments
Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the fastest policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing and fuzzy state aggregation function approximation is tested in two stochastic environments: Tileworld and the simulated robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning reinforcement learning alone. Results from the multi-agent RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through a weighted strategy sharing
Application of Fuzzy State Aggregation and Policy Hill Climbing to Multi-Agent Systems in Stochastic Environments
Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually even as the operating environment changes. Applying this learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing (PHC) and fuzzy state aggregation (FSA) function approximation is tested in two stochastic environments; Tileworld and the robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning lone. Results from the RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through a weighted strategy sharing
Scaling reinforcement learning to the unconstrained multi-agent domain
Reinforcement learning is a machine learning technique designed to mimic the
way animals learn by receiving rewards and punishment. It is designed to train
intelligent agents when very little is known about the agent’s environment, and consequently
the agent’s designer is unable to hand-craft an appropriate policy. Using
reinforcement learning, the agent’s designer can merely give reward to the agent when
it does something right, and the algorithm will craft an appropriate policy automatically.
In many situations it is desirable to use this technique to train systems of agents
(for example, to train robots to play RoboCup soccer in a coordinated fashion). Unfortunately,
several significant computational issues occur when using this technique
to train systems of agents. This dissertation introduces a suite of techniques that
overcome many of these difficulties in various common situations.
First, we show how multi-agent reinforcement learning can be made more tractable
by forming coalitions out of the agents, and training each coalition separately. Coalitions
are formed by using information-theoretic techniques, and we find that by using
a coalition-based approach, the computational complexity of reinforcement-learning
can be made linear in the total system agent count. Next we look at ways to integrate
domain knowledge into the reinforcement learning process, and how this can signifi-cantly improve the policy quality in multi-agent situations. Specifically, we find that
integrating domain knowledge into a reinforcement learning process can overcome training data deficiencies and allow the learner to converge to acceptable solutions
when lack of training data would have prevented such convergence without domain
knowledge. We then show how to train policies over continuous action spaces, which
can reduce problem complexity for domains that require continuous action spaces
(analog controllers) by eliminating the need to finely discretize the action space. Finally,
we look at ways to perform reinforcement learning on modern GPUs and show
how by doing this we can tackle significantly larger problems. We find that by offloading
some of the RL computation to the GPU, we can achieve almost a 4.5 speedup
factor in the total training process
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies
RoboCup soccer competitions are considered among the most challenging
multi-robot adversarial environments, due to their high dynamism and the
partial observability of the environment. In this paper we introduce a method
based on a combination of Monte Carlo search and data aggregation (MCSDA) to
adapt discrete-action soccer policies for a defender robot to the strategy of
the opponent team. By exploiting a simple representation of the domain, a
supervised learning algorithm is trained over an initial collection of data
consisting of several simulations of human expert policies. Monte Carlo policy
rollouts are then generated and aggregated to previous data to improve the
learned policy over multiple epochs and games. The proposed approach has been
extensively tested both on a soccer-dedicated simulator and on real robots.
Using this method, our learning robot soccer team achieves an improvement in
ball interceptions, as well as a reduction in the number of opponents' goals.
Together with a better performance, an overall more efficient positioning of
the whole team within the field is achieved
- …