18,089 research outputs found

    Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

    Full text link
    In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the α\alpha-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.Comment: In proceedings AAAI-1

    NASA Automated Rendezvous and Capture Review. Executive summary

    Get PDF
    In support of the Cargo Transfer Vehicle (CTV) Definition Studies in FY-92, the Advanced Program Development division of the Office of Space Flight at NASA Headquarters conducted an evaluation and review of the United States capabilities and state-of-the-art in Automated Rendezvous and Capture (AR&C). This review was held in Williamsburg, Virginia on 19-21 Nov. 1991 and included over 120 attendees from U.S. government organizations, industries, and universities. One hundred abstracts were submitted to the organizing committee for consideration. Forty-two were selected for presentation. The review was structured to include five technical sessions. Forty-two papers addressed topics in the five categories below: (1) hardware systems and components; (2) software systems; (3) integrated systems; (4) operations; and (5) supporting infrastructure
    corecore