6,726 research outputs found
Acceleration in Policy Optimization
We work towards a unifying paradigm for accelerating policy optimization
methods in reinforcement learning (RL) by integrating foresight in the policy
improvement step via optimistic and adaptive updates. Leveraging the connection
between policy iteration and policy gradient methods, we view policy
optimization algorithms as iteratively solving a sequence of surrogate
objectives, local lower bounds on the original objective. We define optimism as
predictive modelling of the future behavior of a policy, and adaptivity as
taking immediate and anticipatory corrective actions to mitigate accumulating
errors from overshooting predictions or delayed responses to change. We use
this shared lens to jointly express other well-known algorithms, including
model-based policy improvement based on forward search, and optimistic
meta-learning algorithms. We analyze properties of this formulation, and show
connections to other accelerated optimization algorithms. Then, we design an
optimistic policy gradient algorithm, adaptive via meta-gradient learning, and
empirically highlight several design choices pertaining to acceleration, in an
illustrative task
Gait learning for soft microrobots controlled by light fields
Soft microrobots based on photoresponsive materials and controlled by light
fields can generate a variety of different gaits. This inherent flexibility can
be exploited to maximize their locomotion performance in a given environment
and used to adapt them to changing conditions. Albeit, because of the lack of
accurate locomotion models, and given the intrinsic variability among
microrobots, analytical control design is not possible. Common data-driven
approaches, on the other hand, require running prohibitive numbers of
experiments and lead to very sample-specific results. Here we propose a
probabilistic learning approach for light-controlled soft microrobots based on
Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach
results in a learning scheme that is data-efficient, enabling gait optimization
with a limited experimental budget, and robust against differences among
microrobot samples. These features are obtained by designing the learning
scheme through the comparison of different GP priors and BO settings on a
semi-synthetic data set. The developed learning scheme is validated in
microrobot experiments, resulting in a 115% improvement in a microrobot's
locomotion performance with an experimental budget of only 20 tests. These
encouraging results lead the way toward self-adaptive microrobotic systems
based on light-controlled soft microrobots and probabilistic learning control.Comment: 8 pages, 7 figures, to appear in the proceedings of the IEEE/RSJ
International Conference on Intelligent Robots and Systems 201
Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
In this paper, we present a robotic model-based reinforcement learning method
that combines ideas from model identification and model predictive control. We
use a feature-based representation of the dynamics that allows the dynamics
model to be fitted with a simple least squares procedure, and the features are
identified from a high-level specification of the robot's morphology,
consisting of the number and connectivity structure of its links. Model
predictive control is then used to choose the actions under an optimistic model
of the dynamics, which produces an efficient and goal-directed exploration
strategy. We present real time experimental results on standard benchmark
problems involving the pendulum, cartpole, and double pendulum systems.
Experiments indicate that our method is able to learn a range of benchmark
tasks substantially faster than the previous best methods. To evaluate our
approach on a realistic robotic control task, we also demonstrate real time
control of a simulated 7 degree of freedom arm.Comment: 8 page
Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods
This work proposes a universal and adaptive second-order method for
minimizing second-order smooth, convex functions. Our algorithm achieves
convergence when the oracle feedback is stochastic with
variance , and improves its convergence to with
deterministic oracles, where is the number of iterations. Our method also
interpolates these rates without knowing the nature of the oracle apriori,
which is enabled by a parameter-free adaptive step-size that is oblivious to
the knowledge of smoothness modulus, variance bounds and the diameter of the
constrained set. To our knowledge, this is the first universal algorithm with
such global guarantees within the second-order optimization literature.Comment: 32 pages, 4 figures, accepted at NeurIPS 202
A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems
Among the algorithms that are likely to play a major role in future exascale
computing, the fast multipole method (FMM) appears as a rising star. Our
previous recent work showed scaling of an FMM on GPU clusters, with problem
sizes in the order of billions of unknowns. That work led to an extremely
parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This
paper reports on a a campaign of performance tuning and scalability studies
using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were
parallelized using OpenMP, and a test using 10^7 particles randomly distributed
in a cube showed 78% efficiency on 8 threads. Tuning of the
particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of
the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel
scalability was studied in both strong and weak scaling. The strong scaling
test used 10^8 particles and resulted in 93% parallel efficiency on 2048
processes for the non-SIMD code and 54% for the SIMD-optimized code (which was
still 2x faster). The weak scaling test used 10^6 particles per process, and
resulted in 72% efficiency on 32,768 processes, with the largest calculation
taking about 40 seconds to evaluate more than 32 billion unknowns. This work
builds up evidence for our view that FMM is poised to play a leading role in
exascale computing, and we end the paper with a discussion of the features that
make it a particularly favorable algorithm for the emerging heterogeneous and
massively parallel architectural landscape
New acceleration technique for the backpropagation algorithm
Artificial neural networks have been studied for many years in the hope of achieving human like performance in the area of pattern recognition, speech synthesis and higher level of cognitive process. In the connectionist model there are several interconnected processing elements called the neurons that have limited processing capability. Even though the rate of information transmitted between these elements is limited, the complex interconnection and the cooperative interaction between these elements results in a vastly increased computing power; The neural network models are specified by an organized network topology of interconnected neurons. These networks have to be trained in order them to be used for a specific purpose. Backpropagation is one of the popular methods of training the neural networks. There has been a lot of improvement over the speed of convergence of standard backpropagation algorithm in the recent past. Herein we have presented a new technique for accelerating the existing backpropagation without modifying it. We have used the fourth order interpolation method for the dominant eigen values, by using these we change the slope of the activation function. And by doing so we increase the speed of convergence of the backpropagation algorithm; Our experiments have shown significant improvement in the convergence time for problems widely used in benchmarKing Three to ten fold decrease in convergence time is achieved. Convergence time decreases as the complexity of the problem increases. The technique adjusts the energy state of the system so as to escape from local minima
- …