2 research outputs found
Context-Dependent Upper-Confidence Bounds for Directed Exploration
Directed exploration strategies for reinforcement learning are critical for
learning an optimal policy in a minimal number of interactions with the
environment. Many algorithms use optimism to direct exploration, either through
visitation estimates or upper confidence bounds, as opposed to data-inefficient
strategies like \epsilon-greedy that use random, undirected exploration. Most
data-efficient exploration methods require significant computation, typically
relying on a learned model to guide exploration. Least-squares methods have the
potential to provide some of the data-efficiency benefits of model-based
approaches -- because they summarize past interactions -- with the computation
closer to that of model-free approaches. In this work, we provide a novel,
computationally efficient, incremental exploration strategy, leveraging this
property of least-squares temporal difference learning (LSTD). We derive upper
confidence bounds on the action-values learned by LSTD, with context-dependent
(or state-dependent) noise variance. Such context-dependent noise focuses
exploration on a subset of variable states, and allows for reduced exploration
in other states. We empirically demonstrate that our algorithm can converge
more quickly than other incremental exploration strategies using confidence
estimates on action-values.Comment: Neural Information Processing Systems 201
Deep Reinforcement Learning
We discuss deep reinforcement learning in an overview style. We draw a big
picture, filled with details. We discuss six core elements, six important
mechanisms, and twelve applications, focusing on contemporary work, and in
historical contexts. We start with background of artificial intelligence,
machine learning, deep learning, and reinforcement learning (RL), with
resources. Next we discuss RL core elements, including value function, policy,
reward, model, exploration vs. exploitation, and representation. Then we
discuss important mechanisms for RL, including attention and memory,
unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and
learning to learn. After that, we discuss RL applications, including games,
robotics, natural language processing (NLP), computer vision, finance, business
management, healthcare, education, energy, transportation, computer systems,
and, science, engineering, and art. Finally we summarize briefly, discuss
challenges and opportunities, and close with an epilogue.Comment: Under review for Morgan & Claypool: Synthesis Lectures in Artificial
Intelligence and Machine Learnin