Search CORE

8,480 research outputs found

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Author: Geist Matthieu
Scherrer Bruno
Publication venue
Publication date: 06/06/2013
Field of study

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a paramet erized policy space in order to maximize the associated value function averaged over some predefined distribution. It is probably commonly b elieved that the best one can hope in general from such an approach is to get a local optimum of this criterion. In this article, we show th e following surprising result: \emph{any} (approximate) \emph{local optimum} enjoys a \emph{global performance guarantee}. We compare this g uarantee with the one that is satisfied by Direct Policy Iteration, an approximate dynamic programming algorithm that does some form of Poli cy Search: if the approximation error of Local Policy Search may generally be bigger (because local search requires to consider a space of s tochastic policies), we argue that the concentrability coefficient that appears in the performance bound is much nicer. Finally, we discuss several practical and theoretical consequences of our analysis

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Project communication variables : a comparative study of US and UK industry perceptions

Author: Hardcastle Cliff
Langford David
Murray Michael
Tookey John
Publication venue: Association of Researchers in Construction Management (ARCOM)
Publication date: 01/01/2000
Field of study

Research undertaken at the Construction Industry Institute (CII) in the USA has indicated the need for project managers to focus their attention on six ‘Critical Communication Variables’ as a means of ensuring the fulfillment of time cost and quality targets. These variables refer to the accuracy, timeliness and completeness of information presented to participants, as well as the level of understanding, barriers to and procedures for project based communication. The findings and tools generated by the CII study have been used as part of case study based research examining construction projects in the Central Belt region of Scotland. In addition to the CII data collection tools employed, the Scottish study included semi-structured interviews as a means of contextualising the communication and decision-making taking place. This paper presents the results of this benchmarking exercise, and highlights significant issues that project team members need to improve upon in order to achieve the timeliness quality and cost required in today’s construction industr

University of Strathclyde Institutional Repository

A Contextual Bandit Bake-off

Author: Agarwal Alekh
Bietti Alberto
Langford John
Publication venue
Publication date: 24/01/2020
Field of study

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we also evaluate various components of contextual bandit algorithm design such as loss estimators. Overall, this is a thorough study and review of contextual bandit methodology

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server