2 research outputs found
Online Learning and Planning in Partially Observable Domains without Prior Knowledge
How an agent can act optimally in stochastic, partially observable domains is
a challenge problem, the standard approach to address this issue is to learn
the domain model firstly and then based on the learned model to find the (near)
optimal policy. However, offline learning the model often needs to store the
entire training data and cannot utilize the data generated in the planning
phase. Furthermore, current research usually assumes the learned model is
accurate or presupposes knowledge of the nature of the unobservable part of the
world. In this paper, for systems with discrete settings, with the benefits of
Predictive State Representations~(PSRs), a model-based planning approach is
proposed where the learning and planning phases can both be executed online and
no prior knowledge of the underlying system is required. Experimental results
show compared to the state-of-the-art approaches, our algorithm achieved a high
level of performance with no prior knowledge provided, along with theoretical
advantages of PSRs. Source code is available at
https://github.com/DMU-XMU/PSR-MCTS-Online.Comment: arXiv admin note: text overlap with arXiv:1904.0300
Tensor Decomposition for Multi-agent Predictive State Representation
Predictive state representation~(PSR) uses a vector of action-observation
sequence to represent the system dynamics and subsequently predicts the
probability of future events. It is a concise knowledge representation that is
well studied in a single-agent planning problem domain. To the best of our
knowledge, there is no existing work on using PSR to solve multi-agent planning
problems. Learning a multi-agent PSR model is quite difficult especially with
the increasing number of agents, not to mention the complexity of a problem
domain. In this paper, we resort to tensor techniques to tackle the challenging
task of multi-agent PSR model development problems. By first focusing on a
two-agent setting, we construct the system dynamics matrix as a high order
tensor for a PSR model, learn the prediction parameters and deduce state
vectors directly through two different tensor decomposition methods
respectively, and derive the transition parameters via linear regression.
Subsequently, we generalize the PSR learning approaches in a multi-agent
setting. Experimental results show that our methods can effectively solve
multi-agent PSR modelling problems in multiple problem domains.Comment: 20 pages, 16 figure