20,861 research outputs found

    Hierarchical Policy Search via Return-Weighted Density Estimation

    Full text link
    Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcement learning (RL). Hierarchical RL (HRL) tackles this problem by learning a hierarchical policy, where multiple option policies are in charge of different strategies corresponding to modes of a reward function and a gating policy selects the best option for a given context. Although HRL has been demonstrated to be promising, current state-of-the-art methods cannot still perform well in complex real-world problems due to the difficulty of identifying modes of the reward function. In this paper, we propose a novel method called hierarchical policy search via return-weighted density estimation (HPSDE), which can efficiently identify the modes through density estimation with return-weighted importance sampling. Our proposed method finds option policies corresponding to the modes of the return function and automatically determines the number and the location of option policies, which significantly reduces the burden of hyper-parameters tuning. Through experiments, we demonstrate that the proposed HPSDE successfully learns option policies corresponding to modes of the return function and that it can be successfully applied to a challenging motion planning problem of a redundant robotic manipulator.Comment: The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 9 page

    Accelerating Cooperative Planning for Automated Vehicles with Learned Heuristics and Monte Carlo Tree Search

    Full text link
    Efficient driving in urban traffic scenarios requires foresight. The observation of other traffic participants and the inference of their possible next actions depending on the own action is considered cooperative prediction and planning. Humans are well equipped with the capability to predict the actions of multiple interacting traffic participants and plan accordingly, without the need to directly communicate with others. Prior work has shown that it is possible to achieve effective cooperative planning without the need for explicit communication. However, the search space for cooperative plans is so large that most of the computational budget is spent on exploring the search space in unpromising regions that are far away from the solution. To accelerate the planning process, we combined learned heuristics with a cooperative planning method to guide the search towards regions with promising actions, yielding better solutions at lower computational costs

    Parametric and nonparametric inference in equilibrium job search models

    Get PDF
    Equilibrium job search models allow for labor markets with homogeneous workers and firms to yield nondegenerate wage densities. However, the resulting wage densities do not accord well with empirical regularities. Accordingly, many extensions to the basic equilibrium search model have been considered (e.g., heterogeneity in productivity, heterogeneity in the value of leisure, etc.). It is increasingly common to use nonparametric forms for these extensions and, hence, researchers can obtain a perfect fit (in a kernel smoothed sense) between theoretical and empirical wage densities. This makes it difficult to carry out model comparison of different model extensions. In this paper, we first develop Bayesian parametric and nonparametric methods which are comparable to the existing non-Bayesian literature. We then show how Bayesian methods can be used to compare various nonparametric equilibrium search models in a statistically rigorous sense
    corecore