2 research outputs found
Towards Mixed Optimization for Reinforcement Learning with Program Synthesis
Deep reinforcement learning has led to several recent breakthroughs, though
the learned policies are often based on black-box neural networks. This makes
them difficult to interpret and to impose desired specification constraints
during learning. We present an iterative framework, MORL, for improving the
learned policies using program synthesis. Concretely, we propose to use
synthesis techniques to obtain a symbolic representation of the learned policy,
which can then be debugged manually or automatically using program repair.
After the repair step, we use behavior cloning to obtain the policy
corresponding to the repaired program, which is then further improved using
gradient descent. This process continues until the learned policy satisfies
desired constraints. We instantiate MORL for the simple CartPole problem and
show that the programmatic representation allows for high-level modifications
that in turn lead to improved learning of the policies.Comment: Updated publication details, format. Accepted at NAMPI workshop, ICML
'1
Automatic Discovery of Interpretable Planning Strategies
When making decisions, people often overlook critical information or are
overly swayed by irrelevant information. A common approach to mitigate these
biases is to provide decision-makers, especially professionals such as medical
doctors, with decision aids, such as decision trees and flowcharts. Designing
effective decision aids is a difficult problem. We propose that recently
developed reinforcement learning methods for discovering clever heuristics for
good decision-making can be partially leveraged to assist human experts in this
design process. One of the biggest remaining obstacles to leveraging the
aforementioned methods is that the policies they learn are opaque to people. To
solve this problem, we introduce AI-Interpret: a general method for
transforming idiosyncratic policies into simple and interpretable descriptions.
Our algorithm combines recent advances in imitation learning and program
induction with a new clustering method for identifying a large subset of
demonstrations that can be accurately described by a simple, high-performing
decision rule. We evaluate our new algorithm and employ it to translate
information-acquisition policies discovered through metalevel reinforcement
learning. The results of large behavioral experiments showed that prividing the
decision rules generated by AI-Interpret as flowcharts significantly improved
people's planning strategies and decisions across three diferent classes of
sequential decision problems. Moreover, another experiment revealed that this
approach is significantly more effective than training people by giving them
performance feedback. Finally, a series of ablation studies confirmed that
AI-Interpret is critical to the discovery of interpretable decision rules. We
conclude that the methods and findings presented herein are an important step
towards leveraging automatic strategy discovery to improve human
decision-making.Comment: Submitted to the Special Issue on Reinforcement Learning for Real
Life in Machine Learning Journal (2021). Code available at
https://github.com/RationalityEnhancement/InterpretableStrategyDiscover