255 research outputs found
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio
DeepHive: A multi-agent reinforcement learning approach for automated discovery of swarm-based optimization policies
We present an approach for designing swarm-based optimizers for the global
optimization of expensive black-box functions. In the proposed approach, the
problem of finding efficient optimizers is framed as a reinforcement learning
problem, where the goal is to find optimization policies that require a few
function evaluations to converge to the global optimum. The state of each agent
within the swarm is defined as its current position and function value within a
design space and the agents learn to take favorable actions that maximize
reward, which is based on the final value of the objective function. The
proposed approach is tested on various benchmark optimization functions and
compared to the performance of other global optimization strategies.
Furthermore, the effect of changing the number of agents, as well as the
generalization capabilities of the trained agents are investigated. The results
show superior performance compared to the other optimizers, desired scaling
when the number of agents is varied, and acceptable performance even when
applied to unseen functions. On a broader scale, the results show promise for
the rapid development of domain-specific optimizers
- …