215 research outputs found
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems
This paper was motivated by the problem of how to make robots fuse and
transfer their experience so that they can effectively use prior knowledge and
quickly adapt to new environments. To address the problem, we present a
learning architecture for navigation in cloud robotic systems: Lifelong
Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge
fusion algorithm for upgrading a shared model deployed on the cloud. Then,
effective transfer learning methods in LFRL are introduced. LFRL is consistent
with human cognitive science and fits well in cloud robotic systems.
Experiments show that LFRL greatly improves the efficiency of reinforcement
learning for robot navigation. The cloud robotic system deployment also shows
that LFRL is capable of fusing prior knowledge. In addition, we release a cloud
robotic navigation-learning website based on LFRL
Structure-preserving semi-convex-splitting numerical scheme for a Cahn-Hilliard cross-diffusion system in lymphangiogenesis
A fully discrete semi-convex-splitting finite-element scheme with
stabilization for a degenerate Cahn-Hilliard cross-diffusion system is
analyzed. The system consists of parabolic fourth-order equations for the
volume fraction of the fiber phase and the solute concentration, modeling
pre-patterning of lymphatic vessel morphology. The existence of discrete
solutions is proved, and it is shown that the numerical scheme is energy stable
up to stabilization, conserves the solute mass, and preserves the lower and
upper bounds of the fiber phase fraction. Numerical experiments in two space
dimensions using FreeFEM illustrate the phase segregation and pattern
formation
RESEARCH ON DIFFERENT SIZES OF PLATFORM'S EFFECTS ON THE ATHLETES' LEAVING PLATFORM SPEED IN THE FREESTYLE SKIING AERIAL
The freestyle skiing aerial skill is an advantage project to win medals at the winter Olympic Games for China. This research applies the mathematical model method, combining theory with experiment, with the help of the athletes' leaving platform speed calculation software, to research and Analysis different sizes of platforms' effects on the athletes' leaving platform speed. The research result indicates that: the increasing of the platform height will decrease the leaving platform speed, and the decreasing range is related to the changing range. In order to ensure the specific actions' required leaving platform speed, it can be solved through adjusting the sliding distance and the speed of changing postures
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely
adopted for continuous control tasks in robotics and computer graphics.
However, recent studies have revealed that, when applied to long-term
reinforcement learning problems, model-based RP PGMs may experience chaotic and
non-smooth optimization landscapes with exploding gradient variance, which
leads to slow convergence. This is in contrast to the conventional belief that
reparameterization methods have low gradient estimation variance in problems
such as training deep generative models. To comprehend this phenomenon, we
conduct a theoretical examination of model-based RP PGMs and search for
solutions to the optimization difficulties. Specifically, we analyze the
convergence of the model-based RP PGMs and pinpoint the smoothness of function
approximators as a major factor that affects the quality of gradient
estimation. Based on our analysis, we propose a spectral normalization method
to mitigate the exploding variance issue caused by long model unrolls. Our
experimental results demonstrate that proper normalization significantly
reduces the gradient variance of model-based RP PGMs. As a result, the
performance of the proposed method is comparable or superior to other gradient
estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is
available at https://github.com/agentification/RP_PGM.Comment: Published at NeurIPS 202
- …