255 research outputs found

    Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

    Full text link
    Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning. However, due to nonconvexity, the global convergence of PPO and TRPO remains less understood, which separates theory from practice. In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate. The key to our analysis is the global convergence of infinite-dimensional mirror descent under a notion of one-point monotonicity, where the gradient and iterate are instantiated by neural networks. In particular, the desirable representation power and optimization geometry induced by the overparametrization of such neural networks allow them to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio

    DeepHive: A multi-agent reinforcement learning approach for automated discovery of swarm-based optimization policies

    Full text link
    We present an approach for designing swarm-based optimizers for the global optimization of expensive black-box functions. In the proposed approach, the problem of finding efficient optimizers is framed as a reinforcement learning problem, where the goal is to find optimization policies that require a few function evaluations to converge to the global optimum. The state of each agent within the swarm is defined as its current position and function value within a design space and the agents learn to take favorable actions that maximize reward, which is based on the final value of the objective function. The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies. Furthermore, the effect of changing the number of agents, as well as the generalization capabilities of the trained agents are investigated. The results show superior performance compared to the other optimizers, desired scaling when the number of agents is varied, and acceptable performance even when applied to unseen functions. On a broader scale, the results show promise for the rapid development of domain-specific optimizers
    • …
    corecore