48 research outputs found

    Uma nova abordagem de aprendizagem de máquina combinando elicitação automática de casos, aprendizagem por reforço e mineração de padrões sequenciais para agentes jogadores de damas

    Get PDF
    ake into account, in addition to the environment, the minimizing action of an opponent (such as in games), it is fundamental that the agent has the ability to progressively trace a proĄle of its adversary that aids it in the process of selecting appropriate actions. However, it would be unsuitable to construct an agent with a decision-making system based on only the elaboration of this proĄle, as this would prevent the agent from having its Şown identityŤ, which would leave it at the mercy of its opponent. Following this direction, this work proposes an automatic hybrid Checkers player, called ACE-RL-Checkers, equipped with a dynamic decision-making mechanism, which adapts to the proĄle of its opponent over the course of the game. In such a system, the action selection process (moves) is conducted through a composition of Multi-Layer Perceptron Neural Network and case library. In the case, Neural Network represents the ŞidentityŤ of the agent, i.e., it is an already trained static decision-making module and makes use of the Reinforcement Learning TD( ) techniques. On the other hand, the case library represents the dynamic decision-making module of the agent, which is generated by the Automatic Case Elicitation technique (a particular type of Case-Based Reasoning). This technique has a pseudo-random exploratory behavior, which makes the dynamic decision-making on the part of the agent to be directed, either by the game proĄle of the opponent or randomly. However, when devising such an architecture, it is necessary to avoid the following problem: due to the inherent characteristics of the Automatic Case Elicitation technique, in the game initial phases, in which the quantity of available cases in the library is extremely low due to low knowledge content concerning the proĄle of the adversary, the decisionmaking frequency for random decisions is extremely high, which would be detrimental to the performance of the agent. In order to attack this problem, this work also proposes to incorporate onto the ACE-RL-Checkers architecture a third module composed of a base of experience rules, extracted from games played by human experts, using a Sequential Pattern Mining technique. The objective behind using such a base is to reĄne and accelerate the adaptation of the agent to the proĄle of its opponent in the initial phases of their confrontations. Experimental results conducted in tournaments involving ACE-RL-Checkers and other agents correlated with this work, conĄrm the superiority of the dynamic architecture proposed herein.Fundação de Amparo a Pesquisa do Estado de Minas GeraisTese (Doutorado)Agentes que operam em ambientes onde as tomadas de decisão precisam levar em conta, além do ambiente, a atuação minimizadora de um oponente (tal como nos jogos), é fundamental que o agente seja dotado da habilidade de, progressivamente, traçar um perĄl de seu adversário que o auxilie em seu processo de seleção de ações apropriadas. Entretanto, seria improdutivo construir um agente com um sistema de tomada de decisão baseado apenas na elaboração desse perĄl, pois isso impediria o agente de ter uma Şidentidade própriaŤ, o que o deixaria a mercê de seu adversário. Nesta direção, este trabalho propõe um sistema automático jogador de Damas híbrido, chamado ACE-RL-Checkers, dotado de um mecanismo dinâmico de tomada de decisões que se adapta ao perĄl de seu oponente no decorrer de um jogo. Em tal sistema, o processo de seleção de ações (movimentos) é conduzido por uma composição de Rede Neural de Perceptron Multicamadas e biblioteca de casos. No caso, a Rede Neural representa a ŞidentidadeŤ do agente, ou seja, é um módulo tomador de decisões estático já treinado e que faz uso da técnica de Aprendizagem por Reforço TD( ). Por outro lado, a biblioteca de casos representa o módulo tomador de decisões dinâmico do agente que é gerada pela técnica de Elicitação Automática de Casos (um tipo particular de Raciocínio Baseado em Casos). Essa técnica possui um comportamento exploratório pseudo-aleatório que faz com que a tomada de decisão dinâmica do agente seja guiada, ora pelo perĄl de jogo do adversário, ora aleatoriamente. Contudo, ao conceber tal arquitetura, é necessário evitar o seguinte problema: devido às características inerentes à técnica de Elicitação Automática de Casos, nas fases iniciais do jogo Ű em que a quantidade de casos disponíveis na biblioteca é extremamente baixa em função do exíguo conhecimento do perĄl do adversário Ű a frequência de tomadas de decisão aleatórias seria muito elevada, o que comprometeria o desempenho do agente. Para atacar tal problema, este trabalho também propõe incorporar à arquitetura do ACE-RLCheckers um terceiro módulo, composto por uma base de regras de experiência extraída a partir de jogos de especialistas humanos, utilizando uma técnica de Mineração de Padrões Sequenciais. O objetivo de utilizar tal base é reĄnar e acelerar a adaptação do agente ao perĄl de seu adversário nas fases iniciais dos confrontos entre eles. Resultados experimentais conduzidos em torneio envolvendo ACE-RL-Checkers e outros agentes correlacionados com este trabalho, conĄrmam a superioridade da arquitetura dinâmica aqui proposta

    Aprimorando o processo de aprendizagem e alocação de agentes inteligentes em plataformas multiagentes: aplicação no domínio do jogo de damas

    Get PDF
    One of the fundamental requirements for a unsupervised multiagent system to reach its objectives is that the agents that make up the system possess specific and complementary abilities, which allow them to act as specialists in the environments where they were trained. The adequate representation of these environments is fundamental to both the learning and to the good performance on the part of the agents, mainly when these act in competitive environments that possess an elevated state space. Likewise, the decisions from multiagent systems, through their allocation of adequate agents into particular situations that occur in these environments, are crucial in order that these successfully reach their objectives. In this sense, the present work presents three new approaches to optimize the performance of multiagent systems, which improves: the architecture and the learning process of the agents that make up the multiagent system; the representation of relevant information of the environments where these agents perform, as well as the process of allocating the adequate agent for performing in distinct situations that occur in these environments. Due to the spatial and technical complexity, the game of Checkers was used as the developmental and evaluative environment for these approaches, which were implemented onto the automatic player MP-Draughts. This player corresponds to a unsupervised multiagent system composed of specialist player agents in distinct phases of a game. In order to implement the proposed approaches onto the MP-Draughts architecture, the following work sequence was adopted: initially, an adaptive neural network was developed, ASONDE, which was used in the MP-Draughts architecture to define the knowledge profiles (clusters) necessary for representing the endgame phase, on which the specialist agents should be trained. Following on, an automatic features selection approach based on the frequent pattern mining was implemented, which extracts the most adequate features to represent the different environments (boards) that can occur during the performance of the multi-agent. Finally, a method for the allocation of agents was developed, which combined clustering artificial neural networks and exception rules, which together are responsible for indicating the most suitable agents to act in the different situations of a game. The partial results obtained from the implementation of each approach, as well as the final result, which applies all these into the MP-Draughts architecture, confirm that these were efficient in dealing with the problems for which they were proposed, in addition to contributing to the general performance of the multi-agent system.FAPEMIG - Fundação de Amparo a Pesquisa do Estado de Minas GeraisTese (Doutorado)Um dos requisitos fundamentais para que um sistema multiagente não supervisionado atinja seus objetivos é que os agentes que o compõem possuam habilidades específicas e complementares que lhe permitam atuar como especialistas nos ambientes em que foram treinados. A representação adequada desses ambientes é fundamental para o aprendizado e para a boa performance dos agentes, principalmente quando esses atuam em ambientes competitivos que possuem elevado espaço de estados. Do mesmo modo, as decisões do sistema multiagente em alocar os agentes adequados para atuarem em determinadas situações que ocorrem nesses ambientes são cruciais para que este atinja, com êxito, seus objetivos. Nesse sentido, o presente trabalho apresenta três novas abordagens para otimizar o desempenho de sistemas multiagentes, as quais aprimoram: a arquitetura e o processo de aprendizagem dos agentes que compõem o sistema multiagente, a representação das informações relevantes dos ambientes de atuação desses agentes, assim como o processo de alocação dos agentes adequados para atuarem nas distintas situações que ocorrem nesses ambientes. Devido à sua complexidade espacial e técnica, o jogo Damas foi utilizado como ambiente de desenvolvimento e avaliação dessas abordagens, as quais foram implementadas na arquitetura do jogador automático MP-Draughts. Tal jogador corresponde a um sistema multiagente não supervisionado composto por agentes jogadores especialistas em fases distintas de um jogo. Para a implementação das abordagens propostas na arquitetura do MP-Draughts, foi adotada a seguinte sequência de trabalho: inicialmente, foi desenvolvida uma rede neural adaptativa, a ASONDE, que foi utilizada na arquitetura do MP-Draughts para definir os perfis (clusters) de conhecimentos necessários para representar a fase de final de jogo, nos quais os agentes especialistas devem ser treinados. Na sequência, foi implementada uma abordagem de seleção automática de características baseada na mineração de padrões frequentes, a qual extrai as mais adequadas para representar os diferentes ambientes (tabuleiros) que podem ocorrer durante a atuação do multiagente. Finalmente, foi desenvolvido um método de alocação de agentes que combina redes neurais artificiais e regras de exceção, as quais em conjunto, são responsáveis por indicar os agentes mais adequados para atuarem nas distintas situações de um jogo. Os resultados parciais obtidos da implementação de cada abordagem, assim como o resultado final que aplica todas elas na arquitetura no MP-Draughts, confirmam que as mesmas foram eficientes para tratar os problemas para os quais foram propostas, além de contribuírem para o desempenho geral do sistema multiagente

    Online learning and mining human play in complex games

    Get PDF

    Investigation into the effect of social learning in reinforcement learning board game playing agents

    Get PDF
    This thesis presents the use of social learning to improve the performance of game playing reinforcement learning agents. Agents are placed in a social learning environment as opposed to the Self-Play learning environment. Their performance is monitored and analysed in order to observe how the performance changes compared to Self-Play agents. Two case studies were conducted, one with the game Tic-Tac-Toe and the other with the African board game of Morabaraba. The Tic-Tac-Toe agents used a table based TD ( ) algorithm to learn the Q values. The results from the tests for the Tic-Tac-Toe agents indicate that the social learning agents perform better than the Self-Play agents in both board tests and competitive tests. By increasing the population sizes of the agents the number of superior social agents also increases as well as improvements in their skill level. In the second case study the agents use function approximation and the TD ( ) algorithm because of a larger number of states. The social agents performed better than the Self-Play agents in the board tests and are not superior in the test where they compete against each other. Larger populations were not possible with the Morabaraba agents but the results are still positive as the agents perform well in the board tests

    Artificial intelligence: a light approach

    Get PDF

    Reinforcement Learning

    Get PDF
    Brains rule the world, and brain-like computation is increasingly used in computers and electronic devices. Brain-like computation is about processing and interpreting data or directly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. The first 11 chapters of this book describe and extend the scope of reinforcement learning. The remaining 11 chapters show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field

    Dynamics in Logistics

    Get PDF
    This open access book highlights the interdisciplinary aspects of logistics research. Featuring empirical, methodological, and practice-oriented articles, it addresses the modelling, planning, optimization and control of processes. Chiefly focusing on supply chains, logistics networks, production systems, and systems and facilities for material flows, the respective contributions combine research on classical supply chain management, digitalized business processes, production engineering, electrical engineering, computer science and mathematical optimization. To celebrate 25 years of interdisciplinary and collaborative research conducted at the Bremen Research Cluster for Dynamics in Logistics (LogDynamics), in this book hand-picked experts currently or formerly affiliated with the Cluster provide retrospectives, present cutting-edge research, and outline future research directions

    Low-resource learning in complex games

    Get PDF
    This project is concerned with learning to take decisions in complex domains, in games in particular. Previous work assumes that massive data resources are available for training, but aside from a few very popular games, this is generally not the case, and the state of the art in such circumstances is to rely extensively on hand-crafted heuristics. On the other hand, human players are able to quickly learn from only a handful of examples, exploiting specific characteristics of the learning problem to accelerate their learning process. Designing algorithms that function in a similar way is an open area of research and has many applications in today’s complex decision problems. One solution presented in this work is design learning algorithms that exploit the inherent structure of the game. Specifically, we take into account how the action space can be clustered into sets called types and exploit this characteristic to improve planning at decision time. Action types can also be leveraged to extract high-level strategies from a sparse corpus of human play, and this generates more realistic trajectories during planning, further improving performance. Another approach that proved successful is using an accurate model of the environment to reduce the complexity of the learning problem. Similar to how human players have an internal model of the world that allows them to focus on the relevant parts of the problem, we decouple learning to win from learning the rules of the game, thereby making supervised learning more data efficient. Finally, in order to handle partial observability that is usually encountered in complex games, we propose an extension to Monte Carlo Tree Search that plans in the Belief Markov Decision Process. We found that this algorithm doesn’t outperform the state of the art models on our chosen domain. Our error analysis indicates that the method struggles to handle the high uncertainty of the conditions required for the game to end. Furthermore, our relaxed belief model can cause rollouts in the belief space to be inaccurate, especially in complex games. We assess the proposed methods in an agent playing the highly complex board game Settlers of Catan. Building on previous research, our strongest agent combines planning at decision time with prior knowledge extracted from an available corpus of general human play; but unlike this prior work, our human corpus consists of only 60 games, as opposed to many thousands. Our agent defeats the current state of the art agent by a large margin, showing that the proposed modifications aid in exploiting general human play in highly complex games
    corecore