169 research outputs found

    Iterative Online Planning in Multiagent Settings with Limited Model Spaces and PAC Guarantees

    Get PDF
    Methods for planning in multiagent settings often model other agents ’ possible behaviors. However, the space of these models – whether these are policy trees, finite-state controllers or inten-tional models – is very large and thus arbitrarily bounded. This may exclude the true model or the optimal model. In this paper, we present a novel iterative algorithm for online planning that consid-ers a limited model space, updates it dynamically using data from interactions, and provides a provable and probabilistic bound on the approximation error. We ground this approach in the context of graphical models for planning in partially observable multiagent settings – interactive dynamic influence diagrams. We empirically demonstrate that the limited model space facilitates fast solutions and that the true model often enters the limited model space

    Approximating Value Equivalence in Interactive Dynamic Influence Diagrams Using Behavioral Coverage

    Get PDF
    Interactive dynamic influence diagrams (I-DIDs) provide an explicit way of modeling how a subject agent solves decision making problems in the presence of other agents in a common setting. To optimize its decisions, the subject agent needs to predict the other agents' behavior, that is generally obtained by solving their candidate models. This becomes extremely difficult since the model space may be rather large, and grows when the other agents act and observe over the time. A recent proposal for solving I-DIDs lies in a concept of value equivalence (VE) that shows potential advances on significantly reducing the model space. In this paper, we establish a principled framework to implement the VE techniques and propose an approximate method to compute VE of candidate models. The development offers ample opportunity of exploiting VE to further improve the scalability of I-DID solutions. We theoretically analyze properties of the approximate techniques and show empirical results in multiple problem domains

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Toward data-driven solutions to interactive dynamic influence diagrams

    Get PDF
    With the availability of significant amount of data, data-driven decision making becomes an alternative way for solving complex multiagent decision problems. Instead of using domain knowledge to explicitly build decision models, the data-driven approach learns decisions (probably optimal ones) from available data. This removes the knowledge bottleneck in the traditional knowledge-driven decision making, which requires a strong support from domain experts. In this paper, we study data-driven decision making in the context of interactive dynamic influence diagrams (I-DIDs)—a general framework for multiagent sequential decision making under uncertainty. We propose a data-driven framework to solve the I-DIDs model and focus on learning the behavior of other agents in problem domains. The challenge is on learning a complete policy tree that will be embedded in the I-DIDs models due to limited data. We propose two new methods to develop complete policy trees for the other agents in the I-DIDs. The first method uses a simple clustering process, while the second one employs sophisticated statistical checks. We analyze the proposed algorithms in a theoretical way and experiment them over two problem domains

    A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

    Full text link
    Optimal policies in standard MDPs can be obtained using either value iteration or policy iteration. However, in the case of zero-sum Markov games, there is no efficient policy iteration algorithm; e.g., it has been shown that one has to solve Omega(1/(1-alpha)) MDPs, where alpha is the discount factor, to implement the only known convergent version of policy iteration. Another algorithm, called naive policy iteration, is easy to implement but is only provably convergent under very restrictive assumptions. Prior attempts to fix naive policy iteration algorithm have several limitations. Here, we show that a simple variant of naive policy iteration for games converges exponentially fast. The only addition we propose to naive policy iteration is the use of lookahead policies, which are anyway used in practical algorithms. We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied linear MDPs. We illustrate the application of our algorithm by providing bounds for policy-based RL (reinforcement learning) algorithms. We extend the results to the function approximation setting.Comment: 41 page
    • …
    corecore