9 research outputs found

    Utilizing Observed Information for No-Communication Multi-Agent Reinforcement Learning toward Cooperation in Dynamic Environment

    No full text
    This paper proposes a multi-agent reinforcement learning method without communication toward dynamic environments, called profit minimizing reinforcement learning with oblivion of memory (PMRL-OM). PMRL-OM is extended from PMRL and defines a memory range that only utilizes the valuable information from the environment. Since agents do not require information observed before an environmental change, the agents utilize the information acquired after a certain iteration, which is performed by the memory range. In addition, PMRL-OM improves the update function for a goal value as a priority of purpose and updates the goal value based on newer information. To evaluate the effectiveness of PMRL-OM, this study compares PMRL-OM with PMRL in five dynamic maze environments, including state changes for two types of cooperation, position changes for two types of cooperation, and a combined case from these four cases. The experimental results revealed that: (a) PMRL-OM was an effective method for cooperation in all five cases of dynamic environments examined in this study; (b) PMRL-OM was more effective than PMRL was in these dynamic environments; and (c) in a memory range of 100 to 500, PMRL-OM performs well

    Cognitive Learning System for Sequential Aliasing Patterns of States in Multistep Decision-Making

    No full text
    Perceptual aliasing is a cognitive problem for a learning agent where the robot cannot distinguish its state via its immediate observations, leading to poor decision-making. Previous work addresses this issue by storing the agent's path to learn the optimal policy. In particular, FoRsXCS utilises a fundamental and unique path to identify and disambiguate all aliased states and learn optimal policies in an environment with aliased states. However, it is hard to identify the aliased states in sequential aliasing patterns of states where the aliased states occur sequentially within a regular pattern. This work proposes a new cognitive learning system to identify such sequential aliasing patterns of states by extending FoRsXCS. The experimental results show that the proposed system performs equal to or greater than the existing systems in nine mazes for navigation tasks and significantly outperforms existing techniques in mazes with sequential aliasing patterns. Concretely, the proposed method improves its performance by 0.48 steps compared with FoRsXCS.</p

    Weighted Opinion Sharing Model for Cutting Link and Changing Information among Agents as Dynamic Environment

    No full text
    This paper proposes a weighted opinion-sharing method called conformity-autonomous adaptive tuning (C-AAT) that enables agents to communicate and share correct information in a small-world network even when the links and information change dynamically. Concretely, each agent estimates weights for each of its neighbors by comparing their opinions with its own, increasing the weight if both are the same and decreasing it otherwise. To investigate the proposed method&apos;s effectiveness, experiments were conducted for three scenarios: (1) a static network with sensor agents that were almost equally likely to share incorrect environment information; (2) a static network with sensor agents whose probability of sharing incorrect information changed over time; (3) a dynamic network where some agent links were randomly cut over time. The experimental results led to three conclusions about C-AAT: (i) it can make the agents&apos; opinions robust against incorrect sensor agent opinions by decreasing the weights; (ii) it can decrease the weights of agents conveying incorrect opinions with varying probabilities to prevent incorrect opinions being shared; and (iii) it can help agents share correct opinions by increasing the weights of their neighbors even if the agents receive fewer opinions due to links being cut

    Sleep Stage Estimation Comparing Own Past Heartrate or Others&apos; Heartrate

    No full text
    To increase the accuracy of real-time sleep stage estimation when only a small number of sleep data can be obtained such as after going to bed, the information for the subject&apos;s own sleep and other&apos;s past sleep can be used. However, the types of other sleep data such as the subject&apos;s own sleep data or other&apos;s sleep data, which can contribute to the increase of the sleep stage estimation accuracy, are unknown. Therefore, this paper focuses on these two types of sleep data (i.e., the subject&apos;s own sleep data and other&apos;s sleep data) and aims to investigate the usefulness of these data for increasing the sleep stage estimation accuracy. Using human subject experiments, the following conclusions have been revealed: (1) the accuracy of the sleep stage estimation is improved using the similarity of either the subject&apos;s own past sleep or others&apos; sleep and (2) the accuracy of the estimation method using the sleep data of six other people is higher than those obtained using the data for none, one, or two people

    Multi-Agent Cooperation Based on Reinforcement Learning with Internal Reward in Maze Problem

    No full text
    This paper introduces a reinforcement learning technique with an internal reward for a multi-agent cooperation task. The proposed methods is an extension of Q-learning which changes the ordinary (external) reward to the internal reward for agent-cooperation. Specifically, we propose here two Q-learning methods, both of which employ the internal reward for the less or no communication. To guarantee the effectiveness of the proposed methods, we theoretically derived the mechanisms that solve the following questions: (1) how the internal rewards should be set to guarantee the cooperation among the agents under the condition of less and no communication; and (2) how the values of the cooperative behaviors types (i.e., the varieties of the cooperative behaviors of the agents) should be updated under the condition of no communication. The intensive simulations on the maze problem for the agent-cooperation task have been revealed that our two proposed methods successfully enable the agents to acquire their cooperative behaviors even in less or no communication, while the conventional method (Q-learning) always fails to acquire such behaviors
    corecore