20,861 research outputs found
Hierarchical Policy Search via Return-Weighted Density Estimation
Learning an optimal policy from a multi-modal reward function is a
challenging problem in reinforcement learning (RL). Hierarchical RL (HRL)
tackles this problem by learning a hierarchical policy, where multiple option
policies are in charge of different strategies corresponding to modes of a
reward function and a gating policy selects the best option for a given
context. Although HRL has been demonstrated to be promising, current
state-of-the-art methods cannot still perform well in complex real-world
problems due to the difficulty of identifying modes of the reward function. In
this paper, we propose a novel method called hierarchical policy search via
return-weighted density estimation (HPSDE), which can efficiently identify the
modes through density estimation with return-weighted importance sampling. Our
proposed method finds option policies corresponding to the modes of the return
function and automatically determines the number and the location of option
policies, which significantly reduces the burden of hyper-parameters tuning.
Through experiments, we demonstrate that the proposed HPSDE successfully learns
option policies corresponding to modes of the return function and that it can
be successfully applied to a challenging motion planning problem of a redundant
robotic manipulator.Comment: The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 9
page
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Accelerating Cooperative Planning for Automated Vehicles with Learned Heuristics and Monte Carlo Tree Search
Efficient driving in urban traffic scenarios requires foresight. The
observation of other traffic participants and the inference of their possible
next actions depending on the own action is considered cooperative prediction
and planning. Humans are well equipped with the capability to predict the
actions of multiple interacting traffic participants and plan accordingly,
without the need to directly communicate with others. Prior work has shown that
it is possible to achieve effective cooperative planning without the need for
explicit communication. However, the search space for cooperative plans is so
large that most of the computational budget is spent on exploring the search
space in unpromising regions that are far away from the solution. To accelerate
the planning process, we combined learned heuristics with a cooperative
planning method to guide the search towards regions with promising actions,
yielding better solutions at lower computational costs
Parametric and nonparametric inference in equilibrium job search models
Equilibrium job search models allow for labor markets with homogeneous workers and firms to yield nondegenerate wage densities. However, the resulting wage densities do not accord well with empirical regularities. Accordingly, many extensions to the basic equilibrium search model have been considered (e.g., heterogeneity in productivity, heterogeneity in the value of leisure, etc.). It is increasingly common to use nonparametric forms for these extensions and, hence, researchers can obtain a perfect fit (in a kernel smoothed sense) between theoretical and empirical wage densities. This makes it difficult to carry out model comparison of different model extensions. In this paper, we first develop Bayesian parametric and nonparametric methods which are comparable to the existing non-Bayesian literature. We then show how Bayesian methods can be used to compare various nonparametric equilibrium search models in a statistically rigorous sense
- …