31,311 research outputs found
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Off-policy learning is more unstable compared to on-policy learning in
reinforcement learning (RL). One reason for the instability of off-policy
learning is a discrepancy between the target () and behavior (b) policy
distributions. The discrepancy between and b distributions can be
alleviated by employing a smooth variant of the importance sampling (IS), such
as the relative importance sampling (RIS). RIS has parameter
which controls smoothness. To cope with instability, we present the first
relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free
algorithms in RL. In our method, the network yields a target policy (the
actor), a value function (the critic) assessing the current policy ()
using samples drawn from behavior policy. We use action value generated from
the behavior policy in reward function to train our algorithm rather than from
the target policy. We also use deep neural networks to train both actor and
critic. We evaluated our algorithm on a number of Open AI Gym benchmark
problems and demonstrate better or comparable performance to several
state-of-the-art RL baselines
Generalized Off-Policy Actor-Critic
We propose a new objective, the counterfactual objective, unifying existing
objectives for off-policy policy gradient algorithms in the continuing
reinforcement learning (RL) setting. Compared to the commonly used excursion
objective, which can be misleading about the performance of the target policy
when deployed, our new objective better predicts such performance. We prove the
Generalized Off-Policy Policy Gradient Theorem to compute the policy gradient
of the counterfactual objective and use an emphatic approach to get an unbiased
sample from this policy gradient, yielding the Generalized Off-Policy
Actor-Critic (Geoff-PAC) algorithm. We demonstrate the merits of Geoff-PAC over
existing algorithms in Mujoco robot simulation tasks, the first empirical
success of emphatic algorithms in prevailing deep RL benchmarks.Comment: NeurIPS 201
GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning
Automatic transistor sizing is a challenging problem in circuit design due to
the large design space, complex performance trade-offs, and fast technological
advancements. Although there has been plenty of work on transistor sizing
targeting on one circuit, limited research has been done on transferring the
knowledge from one circuit to another to reduce the re-design overhead. In this
paper, we present GCN-RL Circuit Designer, leveraging reinforcement learning
(RL) to transfer the knowledge between different technology nodes and
topologies. Moreover, inspired by the simple fact that circuit is a graph, we
learn on the circuit topology representation with graph convolutional neural
networks (GCN). The GCN-RL agent extracts features of the topology graph whose
vertices are transistors, edges are wires. Our learning-based optimization
consistently achieves the highest Figures of Merit (FoM) on four different
circuits compared with conventional black-box optimization methods (Bayesian
Optimization, Evolutionary Algorithms), random search, and human expert
designs. Experiments on transfer learning between five technology nodes and two
circuit topologies demonstrate that RL with transfer learning can achieve much
higher FoMs than methods without knowledge transfer. Our transferable
optimization method makes transistor sizing and design porting more effective
and efficient.Comment: Accepted to the 57th Design Automation Conference (DAC 2020); 6
pages, 8 figure
- …