139 research outputs found
Online Trajectory Optimization Using Inexact Gradient Feedback for Time-Varying Environments
This paper considers the problem of online trajectory design under
time-varying environments. We formulate the general trajectory optimization
problem within the framework of time-varying constrained convex optimization
and proposed a novel version of the online gradient ascent algorithm for such
problems. Moreover, the gradient feedback is noisy and allows us to use the
proposed algorithm for a range of practical applications where it is difficult
to acquire the true gradient. In contrast to the most available literature, we
present the offline sublinear regret of the proposed algorithm up to the path
length variations of the optimal offline solution, the cumulative gradient, and
the error in the gradient variations. Furthermore, we establish a lower bound
on the offline dynamic regret, which defines the optimality of any trajectory.
To show the efficacy of the proposed algorithm, we consider two practical
problems of interest. First, we consider a device to device (D2D)
communications setting, and the goal is to design a user trajectory while
maximizing its connectivity to the internet. The second problem is associated
with the online planning of energy-efficient trajectories for unmanned surface
vehicles (USV) under strong disturbances in ocean environments with both static
and dynamic goal locations. The detailed simulation results demonstrate the
significance of the proposed algorithm on synthetic and real data sets. Video
on the real-world datasets can be found at
{https://www.youtube.com/watch?v=FcRqqWtpf\_0}Comment: arXiv admin note: text overlap with arXiv:1804.0486
Acceleration in Policy Optimization
We work towards a unifying paradigm for accelerating policy optimization
methods in reinforcement learning (RL) by integrating foresight in the policy
improvement step via optimistic and adaptive updates. Leveraging the connection
between policy iteration and policy gradient methods, we view policy
optimization algorithms as iteratively solving a sequence of surrogate
objectives, local lower bounds on the original objective. We define optimism as
predictive modelling of the future behavior of a policy, and adaptivity as
taking immediate and anticipatory corrective actions to mitigate accumulating
errors from overshooting predictions or delayed responses to change. We use
this shared lens to jointly express other well-known algorithms, including
model-based policy improvement based on forward search, and optimistic
meta-learning algorithms. We analyze properties of this formulation, and show
connections to other accelerated optimization algorithms. Then, we design an
optimistic policy gradient algorithm, adaptive via meta-gradient learning, and
empirically highlight several design choices pertaining to acceleration, in an
illustrative task
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
- âŠ