21 research outputs found

    Multi-agent reinforcement learning for planning and scheduling multiple goals

    No full text
    Recently, reinforcement learning has been proposed as an effective method for knowledge acquisition of the multiagent systems. However, most researches on multiagent system applying a reinforcement learning algorithm focus on the method to reduce complexity due to the existence of multiple agents and goals. Though these pre-defined structures succeeded in putting down the undesirable effect due to the existence of multiple agents, they would also suppress the desirable emergence of cooperative behaviors in the multiagent domain. We show that the potential cooperative properties among the agent are emerged by means of Profit-sharing which is robust in the non-MDPs

    Multi-agent Reinforcement Learning for Planning and Conflict Resolution in a Dynamic Domain

    No full text
    nces, since it is very difficult to know in advance what effective action should be taken at each possible state of the environment. Each transporter agent is modeled as a reinforcement learning entity in an unknown environment, where there is no communication with the other agents, and there are no intermediate sub-goals for which intermediate rewards can be given. It should be noted that there are other agents within the environment that are also learning independently of each other, without sharing sensory inputs or policies. As a result, the other agents appear as additional components within the environment, whose behavior is dynamic and unpredictable. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 11 12 13 14 10 9 8 7 6 5 4 3 2 1 Shelter1 Agent1 G1 Agent2 Shelter2 G2 1 2 Limited Sight (b) Conflicting and Ambiguous Situation. G1 G2 1 2 01234 0 1 2&lt

    Flexible Traffic Signal Control via Multi-Objective Reinforcement Learning

    No full text
    Deep reinforcement learning has been extensively studied for traffic signal control owing to its ability to process large amounts of information and achieving superior performance control. However, this method acquires flow-specific policies during learning. Thus, its performance under inexperienced traffic flows is not guaranteed. Moreover, the traffic signal control problem formulation assumes that the optimal policy differs for each traffic flow ratio owing to the trade-off between orthogonal roads at an intersection. Therefore, multiple policies must be switched to avoid performance decay for traffic flow changes. In this study, we use multi-objective reinforcement learning to determine the policy corresponding to each traffic flow ratio exhaustively. Subsequently, these policies are switched to the current traffic flow ratio to achieve flexible control over traffic flow changes. The proposed method achieves the shortest average travel times in all environments compared with rule-based and single-objective reinforcement learning methods for stationary traffic and traffic flows with varying flow ratios

    Evaluating Advantage of Sharing Information among Vehicles toward Avoiding Phantom Congestion

    No full text

    Neural scalarisation for multi-objective inverse reinforcement learning

    No full text
    Multi-objective inverse reinforcement learning (MOIRL) extends inverse reinforcement learning (IRL) to multi-objective problems by estimating weights and multi-objective rewards to help retrain and analyse preference-conditioned behaviour. Unlike previous methods using linear scalarisation, we propose a MOIRL method using neural scalarisation. This method comprises four neural networks: weight mapping, reward, scalarisation and weight back-translation. Additionally, we introduce two stabilization techniques for learning the proposed method. Experiments show that the proposed method can estimate appropriate weights and rewards reflecting true multi-objective intentions. Furthermore, the estimated weights and rewards can be used for retraining to reproduce the expert solutions

    Extracting the Heterogeneous Strategies from Crowd Data Using Evolutionary Computation

    No full text

    Model-Based Learning with Hierarchical Relational Skills

    No full text
    In this paper we describe Icarus, an architecture for physical agents that uses hierarchical skills to support reactive execution. We review an earlier version of the system, then present an extended framework that associates reward with stored concepts and utilizes a model-based approach to select among instantiated skills. Learning involves estimating expected the durations and success probabilities from execution traces. We conclude with comments on related work and plans for further extensions. 1

    状態遷移確率の異なるMDP環境間で無矛盾な報酬の推定法

    No full text
    corecore