4 research outputs found

    Achieving Cooperative Behavior Based on Intention Estimation by Learning Combinations of Modules

    Get PDF
    A robot needs to process information appropriately depending on the environment or context. However, some of the abilities required by a robot are often common irrespective of the environment or context. In such situations, the learning agent should not learn the abilities again but use the learning results of previous tasks. In the field of the study of intellectual systems, models have been proposed that solve complex problems by combining modules, each of which serve a specific function such as recognition, planning, or action selection. The models can use the learning results of previous tasks in different environments or contexts by combining modules it has learnt. In this paper, we focus on achieving cooperative behavior based on intention estimation, and propose a model for a learning agent that can acquire combinations of modules using which the agent can achieve cooperative behavior based on intention estimation. The experimental results indicate that a desirable combination of the modules was acquired and the learning process suitably progressed

    Multi-agent reinforcement learning algorithm to handle beliefs of other agents' policies and embedded beliefs

    No full text

    ABSTRACT Multi-agent Reinforcement Learning Algorithm to Handle Beliefs of Other Agents ’ Policies and Embedded Beliefs

    No full text
    We have developed a new series of multi-agent reinforcement learning algorithms that choose a policy based on beliefs about co-players’ policies. The algorithms are applicable to situations where a state is fully observable by the agents, but there is no limit on the number of players. Some of the algorithms employ embedded beliefs to handle the cases that co-players are also choosing a policy based on their beliefs of others ’ policies. Simulation experiments on Iterated Prisoners ’ Dilemma games show that the algorithms using on policy-based belief converge to highly mutually-cooperative behavior, unlike the existing algorithms based on action-based belief
    corecore