Skip to main content
Article thumbnail
Location of Repository

Agnostic system identification for model-based reinforcement learning

By Stephane Ross and J. Andrew Bagnell


This supplementary material contains the detailed proofs and analysis of the theoretical results presented in the paper. Additional Notation: We first introduce additional notation not used in the paper that is useful in some proofs. In particular, we define d t ω,π the distribution of states at time t if we executed π from time step 1 to t−1, starting from distribution ω at time 1, and dω,π = (1 − γ) ∑ ∞ t=1 γt−1 d t ω,π the discounted distribution of states over the infinite horizon if we follow π, starting in ω at time 1. 1.1. Relating Performance to Error in Model This subsection presents a number of useful lemmas for relating the performance (in terms of expected total cost) of a policy in the real system to the predictive error in the learned model from which the policy was computed. Lemma 1.1. Suppose we learned an approximate model ˆ T instead of the true model T and let ˆ V π represent the value function of π under ˆ T. Then for any state distribution ω: Es∼ω[V π (s) − ˆ V π (s)] Proof

Topics: Corollary 1.1. Suppose for all s, a, C(s, a) ∈
Year: 2012
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.