24,781 research outputs found
On the Model-Misspecification in Reinforcement Learning
The success of reinforcement learning (RL) crucially depends on effective
function approximation when dealing with complex ground-truth models. Existing
sample-efficient RL algorithms primarily employ three approaches to function
approximation: policy-based, value-based, and model-based methods. However, in
the face of model misspecification (a disparity between the ground-truth and
optimal function approximators), it is shown that policy-based approaches can
be robust even when the policy function approximation is under a large
locally-bounded misspecification error, with which the function class may
exhibit a approximation error in specific states and actions, but
remains small on average within a policy-induced state distribution. Yet it
remains an open question whether similar robustness can be achieved with
value-based and model-based approaches, especially with general function
approximation.
To bridge this gap, in this paper we present a unified theoretical framework
for addressing model misspecification in RL. We demonstrate that, through
meticulous algorithm design and sophisticated analysis, value-based and
model-based methods employing general function approximation can achieve
robustness under local misspecification error bounds. In particular, they can
attain a regret bound of , where represents the complexity of the function class,
is the episode length, is the total number of episodes, and
denotes the local bound for misspecification error. Furthermore, we propose an
algorithmic framework that can achieve the same order of regret bound without
prior knowledge of , thereby enhancing its practical applicability
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
- …