Search CORE

2 research outputs found

Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Author: Agrawal Kumar Krishna
Bhupatiraju Surya
Singh Rishabh
Publication venue
Publication date: 03/07/2018
Field of study

Deep reinforcement learning has led to several recent breakthroughs, though the learned policies are often based on black-box neural networks. This makes them difficult to interpret and to impose desired specification constraints during learning. We present an iterative framework, MORL, for improving the learned policies using program synthesis. Concretely, we propose to use synthesis techniques to obtain a symbolic representation of the learned policy, which can then be debugged manually or automatically using program repair. After the repair step, we use behavior cloning to obtain the policy corresponding to the repaired program, which is then further improved using gradient descent. This process continues until the learned policy satisfies desired constraints. We instantiate MORL for the simple CartPole problem and show that the programmatic representation allows for high-level modifications that in turn lead to improved learning of the policies.Comment: Updated publication details, format. Accepted at NAMPI workshop, ICML '1

arXiv.org e-Print Archive

Automatic Discovery of Interpretable Planning Strategies

Author: Becker Frederic
Lieder Falk
Skirzyński Julian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2021
Field of study

When making decisions, people often overlook critical information or are overly swayed by irrelevant information. A common approach to mitigate these biases is to provide decision-makers, especially professionals such as medical doctors, with decision aids, such as decision trees and flowcharts. Designing effective decision aids is a difficult problem. We propose that recently developed reinforcement learning methods for discovering clever heuristics for good decision-making can be partially leveraged to assist human experts in this design process. One of the biggest remaining obstacles to leveraging the aforementioned methods is that the policies they learn are opaque to people. To solve this problem, we introduce AI-Interpret: a general method for transforming idiosyncratic policies into simple and interpretable descriptions. Our algorithm combines recent advances in imitation learning and program induction with a new clustering method for identifying a large subset of demonstrations that can be accurately described by a simple, high-performing decision rule. We evaluate our new algorithm and employ it to translate information-acquisition policies discovered through metalevel reinforcement learning. The results of large behavioral experiments showed that prividing the decision rules generated by AI-Interpret as flowcharts significantly improved people's planning strategies and decisions across three diferent classes of sequential decision problems. Moreover, another experiment revealed that this approach is significantly more effective than training people by giving them performance feedback. Finally, a series of ablation studies confirmed that AI-Interpret is critical to the discovery of interpretable decision rules. We conclude that the methods and findings presented herein are an important step towards leveraging automatic strategy discovery to improve human decision-making.Comment: Submitted to the Special Issue on Reinforcement Learning for Real Life in Machine Learning Journal (2021). Code available at https://github.com/RationalityEnhancement/InterpretableStrategyDiscover

arXiv.org e-Print Archive