Skip to main content
Article thumbnail
Location of Repository

Policy Improvement for POMDPs Using Normalized Importance Sampling



1 Introduction We assume a standard reinforcement learning setup: an agent interacts with an environment mod- eled as a partially-observable Markov decision process. Consider the situation after a sequence of interactions. The agent has now accumulated data and would like to use that data to select how it will act next. In particular, it has accumulated a sequence of observations, actions, and rewards and it would like to select a policy, a mapping from observations to actions, for future interaction with the world. Ultimately, the goal of the agent is to find a policy mapping that maximizes the agent-s return, the sum of rewards experienced

Year: 2001
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.