CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
research
Estimating the reliability of MDP policies: A confidence interval approach
Authors
D Bohus
DJ Litman
JR Tetreault
Publication date
1 December 2007
Publisher
Abstract
Past approaches for using reinforcement learning to derive dialog control policies have assumed that there was enough collected data to derive a reliable policy. In this paper we present a methodology for numerically constructing confidence intervals for the expected cumulative reward for a learned policy. These intervals are used to (1) better assess the reliability of the expected cumulative reward, and (2) perform a refined comparison between policies derived from different Markov Decision Processes (MDP) models. We applied this methodology to a prior experiment where the goal was to select the best features to include in the MDP statespace. Our results show that while some of the policies developed in the prior work exhibited very large confidence intervals, the policy developed from the best feature set had a much smaller confidence interval and thus showed very high reliability. © 2007 Association for Computational Linguistics
Similar works
Full text
Open in the Core reader
Download PDF
Available Versions
Name not available
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:d-scholarship.pitt.edu:232...
Last time updated on 23/11/2016
Name not available
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:d-scholarship.pitt.edu:232...
Last time updated on 15/12/2016
D-Scholarship@Pitt
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:d-scholarship.pitt.edu:232...
Last time updated on 10/05/2016