18 research outputs found

    Expertness based cooperative Q-learning

    Full text link

    Cooperative Reinforcement Learning Using an Expert-Measuring Weighted Strategy with WoLF

    Get PDF
    Gradient descent learning algorithms have proven effective in solving mixed strategy games. The policy hill climbing (PHC) variants of WoLF (Win or Learn Fast) and PDWoLF (Policy Dynamics based WoLF) have both shown rapid convergence to equilibrium solutions by increasing the accuracy of their gradient parameters over standard Q-learning. Likewise, cooperative learning techniques using weighted strategy sharing (WSS) and expertness measurements improve agent performance when multiple agents are solving a common goal. By combining these cooperative techniques with fast gradient descent learning, an agent’s performance converges to a solution at an even faster rate. This statement is verified in a stochastic grid world environment using a limited visibility hunter-prey model with random and intelligent prey. Among five different expertness measurements, cooperative learning using each PHC algorithm converges faster than independent learning when agents strictly learn from better performing agents

    Swarm Robotics: An Extensive Research Review

    Get PDF

    Docitive Networks. A Step Beyond Cognition

    Get PDF
    Projecte fet en col.laboració amb Centre Tecnològic de Telecomunicacions de CatalunyaCatalà: En les Xarxes Docents es por ta més enllà la idea d'elaborar decisions intel ligents. Per mitjà de compartir informació entre els nodes, amb l'objectiu primordial de reduir la complexitat i millorar el rendiment de les Xarxes Cognitives. Per a això es revisen alguns conceptes importants de les bases de l'Aprenentatge Automàtic, prestant especial atenció a l'aprenentatge per reforç. També es fa una visió de la Teoria de Jocs Evolutius i de la dinàmica de rèpliques. Finalment, simulacions ,basades en el projecte TIC-BUNGEE, es mostren per validar els conceptes introduïts.Castellano: Las Redes Docentes llevan más alla la idea de elaborar decisiones inteligentes, por medio de compartir información entre los nodos, con el objetivo primordial de reducir la complejidad y mejorar el rendimiento de las Redes Cognitiva. Para ello se revisan algunos conceptos importantes de las bases del Aprendizaje Automático, prestando especial atencion al aprendizaje por refuerzo, también damos una visón de la Teoría de Juegos Evolutivos y de la replicación de dinamicas. Por último, las simulaciones basadas en el proyecto TIC-BUNGEE se muestran para validar los conceptos introducidos.English: The Docitive Networks further use the idea of drawing intelligent decisions by means of sharing information between nodes with the prime aim of reduce complexity and enhance performance of Congnitive Networks. To this end we review some important concepts form Machine Learning, paying special atention to Reinforcement Learning, we also go insight Evolutionary Game Theory and Replicator Dynamics. Finally, simulations Based on ICT-BUNGEE project are shown to validate the introduced concepts

    Docitive Networks. A Step Beyond Cognition

    Get PDF
    Projecte fet en col.laboració amb Centre Tecnològic de Telecomunicacions de CatalunyaCatalà: En les Xarxes Docents es por ta més enllà la idea d'elaborar decisions intel ligents. Per mitjà de compartir informació entre els nodes, amb l'objectiu primordial de reduir la complexitat i millorar el rendiment de les Xarxes Cognitives. Per a això es revisen alguns conceptes importants de les bases de l'Aprenentatge Automàtic, prestant especial atenció a l'aprenentatge per reforç. També es fa una visió de la Teoria de Jocs Evolutius i de la dinàmica de rèpliques. Finalment, simulacions ,basades en el projecte TIC-BUNGEE, es mostren per validar els conceptes introduïts.Castellano: Las Redes Docentes llevan más alla la idea de elaborar decisiones inteligentes, por medio de compartir información entre los nodos, con el objetivo primordial de reducir la complejidad y mejorar el rendimiento de las Redes Cognitiva. Para ello se revisan algunos conceptos importantes de las bases del Aprendizaje Automático, prestando especial atencion al aprendizaje por refuerzo, también damos una visón de la Teoría de Juegos Evolutivos y de la replicación de dinamicas. Por último, las simulaciones basadas en el proyecto TIC-BUNGEE se muestran para validar los conceptos introducidos.English: The Docitive Networks further use the idea of drawing intelligent decisions by means of sharing information between nodes with the prime aim of reduce complexity and enhance performance of Congnitive Networks. To this end we review some important concepts form Machine Learning, paying special atention to Reinforcement Learning, we also go insight Evolutionary Game Theory and Replicator Dynamics. Finally, simulations Based on ICT-BUNGEE project are shown to validate the introduced concepts

    Behaviour design in microrobots:hierarchical reinforcement learning under resource constraints

    Get PDF
    In order to verify models of collective behaviors of animals, robots could be manipulated to implement the model and interact with real animals in a mixed-society. This thesis describes design of the behavioral hierarchy of a miniature robot, that is able to interact with cockroaches, and participates in their collective decision makings. The robots are controlled via a hierarchical behavior-based controller in which, more complex behaviors are built by combining simpler behaviors through fusion and arbitration mechanisms. The experiments in the mixed-society confirms the similarity between the collective patterns of the mixed-society and those of the real society. Moreover, the robots are able to induce new collective patterns by modulation of some behavioral parameters. Difficulties in the manual extraction of the behavioral hierarchy and inability to revise it, direct us to benefit from machine learning techniques, in order to devise the composition hierarchy and coordination in an automated way. We derive a Compact Q-Learning method for micro-robots with processing and memory constraints, and try to learn behavior coordination through it. The behavior composition part is still done manually. However, the problem of the curse of dimensionality makes incorporation of this kind of flat-learning techniques unsuitable. Even though optimizing them could temporarily speed up the learning process and widen their range of applications, their scalability to real world applications remains under question. In the next steps, we apply hierarchical learning techniques to automate both behavior coordination and composition parts. In some situations, many features of the state space might be irrelevant to what the robot currently learns. Abstracting these features and discovering the hierarchy among them can help the robot learn the behavioral hierarchy faster. We formalize the automatic state abstraction problem with different heuristics, and derive three new splitting criteria that adapt decision tree learning techniques to state abstraction. Proof of performance is supported by strong evidences from simulation results in deterministic and non-deterministic environments. Simulation results show encouraging enhancements in the required number of learning trials, robot's performance, size of the learned abstraction trees, and computation time of the algorithms. In the other hand, learning in a group provides free sources of knowledge that, if communicated, can broaden the scales of learning, both temporally and spatially. We present two approaches to combine output or structure of abstraction trees. The trees are stored in different RL robots in a multi-robot system, or in the trees learned by the same robot but using different methods. Simulation results in a non-deterministic football learning task provide strong evidences for enhancement in convergence rate and policy performance, specially in heterogeneous cooperations

    Incorporating Memory and Learning Mechanisms Into Meta-RaPS

    Get PDF
    Due to the rapid increase of dimensions and complexity of real life problems, it has become more difficult to find optimal solutions using only exact mathematical methods. The need to find near-optimal solutions in an acceptable amount of time is a challenge when developing more sophisticated approaches. A proper answer to this challenge can be through the implementation of metaheuristic approaches. However, a more powerful answer might be reached by incorporating intelligence into metaheuristics. Meta-RaPS (Metaheuristic for Randomized Priority Search) is a metaheuristic that creates high quality solutions for discrete optimization problems. It is proposed that incorporating memory and learning mechanisms into Meta-RaPS, which is currently classified as a memoryless metaheuristic, can help the algorithm produce higher quality results. The proposed Meta-RaPS versions were created by taking different perspectives of learning. The first approach taken is Estimation of Distribution Algorithms (EDA), a stochastic learning technique that creates a probability distribution for each decision variable to generate new solutions. The second Meta-RaPS version was developed by utilizing a machine learning algorithm, Q Learning, which has been successfully applied to optimization problems whose output is a sequence of actions. In the third Meta-RaPS version, Path Relinking (PR) was implemented as a post-optimization method in which the new algorithm learns the good attributes by memorizing best solutions, and follows them to reach better solutions. The fourth proposed version of Meta-RaPS presented another form of learning with its ability to adaptively tune parameters. The efficiency of these approaches motivated us to redesign Meta-RaPS by removing the improvement phase and adding a more sophisticated Path Relinking method. The new Meta-RaPS could solve even the largest problems in much less time while keeping up the quality of its solutions. To evaluate their performance, all introduced versions were tested using the 0-1 Multidimensional Knapsack Problem (MKP). After comparing the proposed algorithms, Meta-RaPS PR and Meta-RaPS Q Learning appeared to be the algorithms with the best and worst performance, respectively. On the other hand, they could all show superior performance than other approaches to the 0-1 MKP in the literature
    corecore