48 research outputs found
Uma nova abordagem de aprendizagem de máquina combinando elicitação automática de casos, aprendizagem por reforço e mineração de padrões sequenciais para agentes jogadores de damas
ake into account, in addition to the environment, the minimizing action of an opponent
(such as in games), it is fundamental that the agent has the ability to progressively trace
a proĄle of its adversary that aids it in the process of selecting appropriate actions. However,
it would be unsuitable to construct an agent with a decision-making system based
on only the elaboration of this proĄle, as this would prevent the agent from having its
Şown identityŤ, which would leave it at the mercy of its opponent. Following this direction,
this work proposes an automatic hybrid Checkers player, called ACE-RL-Checkers,
equipped with a dynamic decision-making mechanism, which adapts to the proĄle of its
opponent over the course of the game. In such a system, the action selection process
(moves) is conducted through a composition of Multi-Layer Perceptron Neural Network
and case library. In the case, Neural Network represents the ŞidentityŤ of the agent, i.e.,
it is an already trained static decision-making module and makes use of the Reinforcement
Learning TD( ) techniques. On the other hand, the case library represents the
dynamic decision-making module of the agent, which is generated by the Automatic Case
Elicitation technique (a particular type of Case-Based Reasoning). This technique has a
pseudo-random exploratory behavior, which makes the dynamic decision-making on the
part of the agent to be directed, either by the game proĄle of the opponent or randomly.
However, when devising such an architecture, it is necessary to avoid the following problem:
due to the inherent characteristics of the Automatic Case Elicitation technique, in
the game initial phases, in which the quantity of available cases in the library is extremely
low due to low knowledge content concerning the proĄle of the adversary, the decisionmaking
frequency for random decisions is extremely high, which would be detrimental
to the performance of the agent. In order to attack this problem, this work also proposes
to incorporate onto the ACE-RL-Checkers architecture a third module composed
of a base of experience rules, extracted from games played by human experts, using a
Sequential Pattern Mining technique. The objective behind using such a base is to reĄne
and accelerate the adaptation of the agent to the proĄle of its opponent in the initial
phases of their confrontations. Experimental results conducted in tournaments involving
ACE-RL-Checkers and other agents correlated with this work, conĄrm the superiority of
the dynamic architecture proposed herein.Fundação de Amparo a Pesquisa do Estado de Minas GeraisTese (Doutorado)Agentes que operam em ambientes onde as tomadas de decisão precisam levar em
conta, além do ambiente, a atuação minimizadora de um oponente (tal como nos jogos),
é fundamental que o agente seja dotado da habilidade de, progressivamente, traçar um
perĄl de seu adversário que o auxilie em seu processo de seleção de ações apropriadas.
Entretanto, seria improdutivo construir um agente com um sistema de tomada de decisão
baseado apenas na elaboração desse perĄl, pois isso impediria o agente de ter uma Şidentidade
própriaŤ, o que o deixaria a mercê de seu adversário. Nesta direção, este trabalho
propõe um sistema automático jogador de Damas híbrido, chamado ACE-RL-Checkers,
dotado de um mecanismo dinâmico de tomada de decisões que se adapta ao perĄl de seu
oponente no decorrer de um jogo. Em tal sistema, o processo de seleção de ações (movimentos)
é conduzido por uma composição de Rede Neural de Perceptron Multicamadas e
biblioteca de casos. No caso, a Rede Neural representa a ŞidentidadeŤ do agente, ou seja,
é um módulo tomador de decisões estático já treinado e que faz uso da técnica de Aprendizagem
por Reforço TD( ). Por outro lado, a biblioteca de casos representa o módulo
tomador de decisões dinâmico do agente que é gerada pela técnica de Elicitação Automática
de Casos (um tipo particular de Raciocínio Baseado em Casos). Essa técnica possui
um comportamento exploratório pseudo-aleatório que faz com que a tomada de decisão
dinâmica do agente seja guiada, ora pelo perĄl de jogo do adversário, ora aleatoriamente.
Contudo, ao conceber tal arquitetura, é necessário evitar o seguinte problema: devido às
características inerentes à técnica de Elicitação Automática de Casos, nas fases iniciais do
jogo Ű em que a quantidade de casos disponíveis na biblioteca é extremamente baixa em
função do exíguo conhecimento do perĄl do adversário Ű a frequência de tomadas de decisão
aleatórias seria muito elevada, o que comprometeria o desempenho do agente. Para
atacar tal problema, este trabalho também propõe incorporar à arquitetura do ACE-RLCheckers
um terceiro módulo, composto por uma base de regras de experiência extraída
a partir de jogos de especialistas humanos, utilizando uma técnica de Mineração de Padrões
Sequenciais. O objetivo de utilizar tal base é reĄnar e acelerar a adaptação do
agente ao perĄl de seu adversário nas fases iniciais dos confrontos entre eles. Resultados
experimentais conduzidos em torneio envolvendo ACE-RL-Checkers e outros agentes correlacionados
com este trabalho, conĄrmam a superioridade da arquitetura dinâmica aqui
proposta
Aprimorando o processo de aprendizagem e alocação de agentes inteligentes em plataformas multiagentes: aplicação no domínio do jogo de damas
One of the fundamental requirements for a unsupervised multiagent system to reach its objectives is that the agents that make up the system possess specific and complementary abilities, which allow them to act as specialists in the environments where they were trained. The adequate representation of these environments is fundamental to both the learning and to the good performance on the part of the agents, mainly when these act in competitive environments that possess an elevated state space. Likewise, the decisions from multiagent systems, through their allocation of adequate agents into particular situations that occur in these environments, are crucial in order that these successfully reach their objectives. In this sense, the present work presents three new approaches to optimize the performance of multiagent systems, which improves: the architecture and the learning process of the agents that make up the multiagent system; the representation of relevant information of the environments where these agents perform, as well as the process of allocating the adequate agent for performing in distinct situations that occur in these environments. Due to the spatial and technical complexity, the game of Checkers was used as the developmental and evaluative environment for these approaches, which were implemented onto the automatic player MP-Draughts. This player corresponds to a unsupervised multiagent system composed of specialist player agents in distinct phases of a game. In order to implement the proposed approaches onto the MP-Draughts architecture, the following work sequence was adopted: initially, an adaptive neural network was developed, ASONDE, which was used in the MP-Draughts architecture to define the knowledge profiles (clusters) necessary for representing the endgame phase, on which the specialist agents should be trained. Following on, an automatic features selection approach based on the frequent pattern mining was implemented, which extracts the most adequate features to represent the different environments (boards) that can occur during the performance of the multi-agent. Finally, a method for the allocation of agents was developed, which combined clustering artificial neural networks and exception rules, which together are responsible for indicating the most suitable agents to act in the different situations of a game. The partial results obtained from the implementation of each approach, as well as the final result, which applies all these into the MP-Draughts architecture, confirm that these were efficient in dealing with the problems for which they were proposed, in addition to contributing to the general performance of the multi-agent system.FAPEMIG - Fundação de Amparo a Pesquisa do Estado de Minas GeraisTese (Doutorado)Um dos requisitos fundamentais para que um sistema multiagente não supervisionado atinja seus objetivos é que os agentes que o compõem possuam habilidades específicas e complementares que lhe permitam atuar como especialistas nos ambientes em que foram treinados. A representação adequada desses ambientes é fundamental para o aprendizado e para a boa performance dos agentes, principalmente quando esses atuam em ambientes competitivos que possuem elevado espaço de estados. Do mesmo modo, as decisões do sistema multiagente em alocar os agentes adequados para atuarem em determinadas situações que ocorrem nesses ambientes são cruciais para que este atinja, com êxito, seus objetivos. Nesse sentido, o presente trabalho apresenta três novas abordagens para otimizar o desempenho de sistemas multiagentes, as quais aprimoram: a arquitetura e o processo de aprendizagem dos agentes que compõem o sistema multiagente, a representação das informações relevantes dos ambientes de atuação desses agentes, assim como o processo de alocação dos agentes adequados para atuarem nas distintas situações que ocorrem nesses ambientes. Devido à sua complexidade espacial e técnica, o jogo Damas foi utilizado como ambiente de desenvolvimento e avaliação dessas abordagens, as quais foram implementadas na arquitetura do jogador automático MP-Draughts. Tal jogador corresponde a um sistema multiagente não supervisionado composto por agentes jogadores especialistas em fases distintas de um jogo. Para a implementação das abordagens propostas na arquitetura do MP-Draughts, foi adotada a seguinte sequência de trabalho: inicialmente, foi desenvolvida uma rede neural adaptativa, a ASONDE, que foi utilizada na arquitetura do MP-Draughts para definir os perfis (clusters) de conhecimentos necessários para representar a fase de final de jogo, nos quais os agentes especialistas devem ser treinados. Na sequência, foi implementada uma abordagem de seleção automática de características baseada na mineração de padrões frequentes, a qual extrai as mais adequadas para representar os diferentes ambientes (tabuleiros) que podem ocorrer durante a atuação do multiagente. Finalmente, foi desenvolvido um método de alocação de agentes que combina redes neurais artificiais e regras de exceção, as quais em conjunto, são responsáveis por indicar os agentes mais adequados para atuarem nas distintas situações de um jogo. Os resultados parciais obtidos da implementação de cada abordagem, assim como o resultado final que aplica todas elas na arquitetura no MP-Draughts, confirmam que as mesmas foram eficientes para tratar os problemas para os quais foram propostas, além de contribuírem para o desempenho geral do sistema multiagente
Investigation into the effect of social learning in reinforcement learning board game playing agents
This thesis presents the use of social learning to improve the performance of game
playing reinforcement learning agents. Agents are placed in a social learning environment
as opposed to the Self-Play learning environment. Their performance is monitored and
analysed in order to observe how the performance changes compared to Self-Play agents.
Two case studies were conducted, one with the game Tic-Tac-Toe and the other with the
African board game of Morabaraba. The Tic-Tac-Toe agents used a table based TD ( )
algorithm to learn the Q values. The results from the tests for the Tic-Tac-Toe agents
indicate that the social learning agents perform better than the Self-Play agents in both
board tests and competitive tests. By increasing the population sizes of the agents the
number of superior social agents also increases as well as improvements in their skill
level. In the second case study the agents use function approximation and the TD ( )
algorithm because of a larger number of states. The social agents performed better than
the Self-Play agents in the board tests and are not superior in the test where they compete
against each other. Larger populations were not possible with the Morabaraba agents but
the results are still positive as the agents perform well in the board tests
Reinforcement Learning
Brains rule the world, and brain-like computation is increasingly used in computers and electronic devices. Brain-like computation is about processing and interpreting data or directly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. The first 11 chapters of this book describe and extend the scope of reinforcement learning. The remaining 11 chapters show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field
Dynamics in Logistics
This open access book highlights the interdisciplinary aspects of logistics research. Featuring empirical, methodological, and practice-oriented articles, it addresses the modelling, planning, optimization and control of processes. Chiefly focusing on supply chains, logistics networks, production systems, and systems and facilities for material flows, the respective contributions combine research on classical supply chain management, digitalized business processes, production engineering, electrical engineering, computer science and mathematical optimization. To celebrate 25 years of interdisciplinary and collaborative research conducted at the Bremen Research Cluster for Dynamics in Logistics (LogDynamics), in this book hand-picked experts currently or formerly affiliated with the Cluster provide retrospectives, present cutting-edge research, and outline future research directions
Low-resource learning in complex games
This project is concerned with learning to take decisions in complex domains, in games
in particular. Previous work assumes that massive data resources are available for
training, but aside from a few very popular games, this is generally not the case, and the
state of the art in such circumstances is to rely extensively on hand-crafted heuristics.
On the other hand, human players are able to quickly learn from only a handful of
examples, exploiting specific characteristics of the learning problem to accelerate their
learning process. Designing algorithms that function in a similar way is an open area
of research and has many applications in today’s complex decision problems.
One solution presented in this work is design learning algorithms that exploit the
inherent structure of the game. Specifically, we take into account how the action space
can be clustered into sets called types and exploit this characteristic to improve planning
at decision time. Action types can also be leveraged to extract high-level strategies
from a sparse corpus of human play, and this generates more realistic trajectories
during planning, further improving performance.
Another approach that proved successful is using an accurate model of the environment
to reduce the complexity of the learning problem. Similar to how human players
have an internal model of the world that allows them to focus on the relevant parts of
the problem, we decouple learning to win from learning the rules of the game, thereby
making supervised learning more data efficient.
Finally, in order to handle partial observability that is usually encountered in complex
games, we propose an extension to Monte Carlo Tree Search that plans in the
Belief Markov Decision Process. We found that this algorithm doesn’t outperform
the state of the art models on our chosen domain. Our error analysis indicates that the
method struggles to handle the high uncertainty of the conditions required for the game
to end. Furthermore, our relaxed belief model can cause rollouts in the belief space to
be inaccurate, especially in complex games.
We assess the proposed methods in an agent playing the highly complex board
game Settlers of Catan. Building on previous research, our strongest agent combines
planning at decision time with prior knowledge extracted from an available corpus of
general human play; but unlike this prior work, our human corpus consists of only
60 games, as opposed to many thousands. Our agent defeats the current state of the
art agent by a large margin, showing that the proposed modifications aid in exploiting
general human play in highly complex games