8,480 research outputs found
Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee
Local Policy Search is a popular reinforcement learning approach for handling
large state spaces. Formally, it searches locally in a paramet erized policy
space in order to maximize the associated value function averaged over some
predefined distribution. It is probably commonly b elieved that the best one
can hope in general from such an approach is to get a local optimum of this
criterion. In this article, we show th e following surprising result:
\emph{any} (approximate) \emph{local optimum} enjoys a \emph{global performance
guarantee}. We compare this g uarantee with the one that is satisfied by Direct
Policy Iteration, an approximate dynamic programming algorithm that does some
form of Poli cy Search: if the approximation error of Local Policy Search may
generally be bigger (because local search requires to consider a space of s
tochastic policies), we argue that the concentrability coefficient that appears
in the performance bound is much nicer. Finally, we discuss several practical
and theoretical consequences of our analysis
Project communication variables : a comparative study of US and UK industry perceptions
Research undertaken at the Construction Industry Institute (CII) in the USA has indicated the need for project managers to focus their attention on six ‘Critical Communication Variables’ as a means of ensuring the fulfillment of time cost and quality targets. These variables refer to the accuracy, timeliness and completeness of information presented to participants, as well as the level of understanding, barriers to and procedures for project based communication. The findings and tools generated by the CII study have been used as part of case study based research examining construction projects in the Central Belt region of Scotland. In addition to the CII data collection tools employed, the Scottish study included semi-structured interviews as a means of contextualising the communication and decision-making taking place. This paper presents the results of this benchmarking exercise, and highlights significant issues that project team members need to improve upon in order to achieve the timeliness quality and cost required in today’s construction industr
A Contextual Bandit Bake-off
Contextual bandit algorithms are essential for solving many real-world
interactive machine learning problems. Despite multiple recent successes on
statistically and computationally efficient methods, the practical behavior of
these algorithms is still poorly understood. We leverage the availability of
large numbers of supervised learning datasets to empirically evaluate
contextual bandit algorithms, focusing on practical methods that learn by
relying on optimization oracles from supervised learning. We find that a recent
method (Foster et al., 2018) using optimism under uncertainty works the best
overall. A surprisingly close second is a simple greedy baseline that only
explores implicitly through the diversity of contexts, followed by a variant of
Online Cover (Agarwal et al., 2014) which tends to be more conservative but
robust to problem specification by design. Along the way, we also evaluate
various components of contextual bandit algorithm design such as loss
estimators. Overall, this is a thorough study and review of contextual bandit
methodology
- …