112 research outputs found

    Controlled Use of Subgoals in Reinforcement Learning

    Get PDF

    Shared Control Policies and Task Learning for Hydraulic Earth-Moving Machinery

    Get PDF
    This thesis develops a shared control design framework for improving operator efficiency and performance on hydraulic excavation tasks. The framework is based on blended shared control (BSC), a technique whereby the operator’s command input is continually augmented by an assistive controller. Designing a BSC control scheme is subdivided here into four key components. Task learning utilizes nonparametric inverse reinforcement learning to identify the underlying goal structure of a task as a sequence of subgoals directly from the demonstration data of an experienced operator. These subgoals may be distinct points in the actuator space or distributions overthe space, from which the operator draws a subgoal location during the task. The remaining three steps are executed on-line during each update of the BSC controller. In real-time, the subgoal prediction step involves utilizing the subgoal decomposition from the learning process in order to predict the current subgoal of the operator. Novel deterministic and probabilistic prediction methods are developed and evaluated for their ease of implementation and performance against manually labeled trial data. The control generation component involves computing polynomial trajectories to the predicted subgoal location or mean of the subgoal distribution, and computing a control input which tracks those trajectories. Finally, the blending law synthesizes both inputs through a weighted averaging of the human and control input, using a blending parameter which can be static or dynamic. In the latter case, mapping probabilistic quantities such as the maximum a posteriori probability or statistical entropy to the value of the dynamic blending parameter may yield a more intelligent control assistance, scaling the intervention according to the confidence of the prediction. A reduced-scale (1/12) fully hydraulic excavator model was instrumented for BSC experimentation, equipped with absolute position feedback of each hydraulic actuator. Experiments were conducted using a standard operator control interface and a common earthmoving task: loading a truck from a pile. Under BSC, operators experienced an 18% improvement in mean digging efficiency, defined as mass of material moved per cycle time. Effects of BSC vary with regard to pure cycle time, although most operators experienced a reduced mean cycle time

    Creating Multi-Level Skill Hierarchies in Reinforcement Learning

    Get PDF
    What is a useful skill hierarchy for an autonomous agent? We propose an answer based on the graphical structure of an agent's interaction with its environment. Our approach uses hierarchical graph partitioning to expose the structure of the graph at varying timescales, producing a skill hierarchy with multiple levels of abstraction. At each level of the hierarchy, skills move the agent between regions of the state space that are well connected within themselves but weakly connected to each other. We illustrate the utility of the proposed skill hierarchy in a wide variety of domains in the context of reinforcement learning

    Shared Control Policies and Task Learning for Hydraulic Earth-Moving Machinery

    Get PDF
    This thesis develops a shared control design framework for improving operator efficiency and performance on hydraulic excavation tasks. The framework is based on blended shared control (BSC), a technique whereby the operator’s command input is continually augmented by an assistive controller. Designing a BSC control scheme is subdivided here into four key components. Task learning utilizes nonparametric inverse reinforcement learning to identify the underlying goal structure of a task as a sequence of subgoals directly from the demonstration data of an experienced operator. These subgoals may be distinct points in the actuator space or distributions overthe space, from which the operator draws a subgoal location during the task. The remaining three steps are executed on-line during each update of the BSC controller. In real-time, the subgoal prediction step involves utilizing the subgoal decomposition from the learning process in order to predict the current subgoal of the operator. Novel deterministic and probabilistic prediction methods are developed and evaluated for their ease of implementation and performance against manually labeled trial data. The control generation component involves computing polynomial trajectories to the predicted subgoal location or mean of the subgoal distribution, and computing a control input which tracks those trajectories. Finally, the blending law synthesizes both inputs through a weighted averaging of the human and control input, using a blending parameter which can be static or dynamic. In the latter case, mapping probabilistic quantities such as the maximum a posteriori probability or statistical entropy to the value of the dynamic blending parameter may yield a more intelligent control assistance, scaling the intervention according to the confidence of the prediction. A reduced-scale (1/12) fully hydraulic excavator model was instrumented for BSC experimentation, equipped with absolute position feedback of each hydraulic actuator. Experiments were conducted using a standard operator control interface and a common earthmoving task: loading a truck from a pile. Under BSC, operators experienced an 18% improvement in mean digging efficiency, defined as mass of material moved per cycle time. Effects of BSC vary with regard to pure cycle time, although most operators experienced a reduced mean cycle time

    Developing Driving Strategies Efficiently: A Skill-Based Hierarchical Reinforcement Learning Approach

    Full text link
    Driving in dense traffic with human and autonomous drivers is a challenging task that requires high-level planning and reasoning. Human drivers can achieve this task comfortably, and there has been many efforts to model human driver strategies. These strategies can be used as inspirations for developing autonomous driving algorithms or to create high-fidelity simulators. Reinforcement learning is a common tool to model driver policies, but conventional training of these models can be computationally expensive and time-consuming. To address this issue, in this paper, we propose ``skill-based" hierarchical driving strategies, where motion primitives, i.e. skills, are designed and used as high-level actions. This reduces the training time for applications that require multiple models with varying behavior. Simulation results in a merging scenario demonstrate that the proposed approach yields driver models that achieve higher performance with less training compared to baseline reinforcement learning methods.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Efficient Learning with Subgoals and Gaussian Process

    Full text link
    This thesis demonstrates how data efficiency in reinforcement learning can be improved through the use of subgoals and Gaussian process. Data efficiency is extremely important in a range of problems in which gathering additional data is expensive. This tends to be the case in most problems that involve actual interactions with the physical world, such as a robot kicking a ball, an autonomous vehicle driving or a drone manoeuvring. State of the art data efficiency is achieved on several well researched problems. The systems that achieve this learn Gaussian process state transition models of the problem. The model based learner system uses the state transition model to learn the action to take in each state. The subgoal planner makes use of the state transition model to build an explicit plan to solve the problem. The subgoal planner is improved through the use of learned subgoals to aid navigation of the problem space. The resource managed learner balances the costs of computation against the value of selecting better experiments in order to improve data efficiency. An active learning system is used to estimate the value of the experiments in terms of how much they may improve the current solution. This is compared to an estimate of how much better an experiment found by expending additional computation will be along with the costs of performing that computation. A theoretical framework around the use of subgoals in problem solving is presented. This framework provides insights into when and why subgoals are effective, along with avenues for future research. This includes a detailed proposal for a system built off the subgoal theory framework intended to make full use of subgoals to create an effective reinforcement learning system
    • …
    corecore