13,103 research outputs found
Optimism-Based Adaptive Regulation of Linear-Quadratic Systems
The main challenge for adaptive regulation of linear-quadratic systems is the
trade-off between identification and control. An adaptive policy needs to
address both the estimation of unknown dynamics parameters (exploration), as
well as the regulation of the underlying system (exploitation). To this end,
optimism-based methods which bias the identification in favor of optimistic
approximations of the true parameter are employed in the literature. A number
of asymptotic results have been established, but their finite time counterparts
are few, with important restrictions.
This study establishes results for the worst-case regret of optimism-based
adaptive policies. The presented high probability upper bounds are optimal up
to logarithmic factors. The non-asymptotic analysis of this work requires very
mild assumptions; (i) stabilizability of the system's dynamics, and (ii)
limiting the degree of heaviness of the noise distribution. To establish such
bounds, certain novel techniques are developed to comprehensively address the
probabilistic behavior of dependent random matrices with heavy-tailed
distributions.Comment: 28 page
Input Perturbations for Adaptive Control and Learning
This paper studies adaptive algorithms for simultaneous regulation (i.e.,
control) and estimation (i.e., learning) of Multiple Input Multiple Output
(MIMO) linear dynamical systems. It proposes practical, easy to implement
control policies based on perturbations of input signals. Such policies are
shown to achieve a worst-case regret that scales as the square-root of the time
horizon, and holds uniformly over time. Further, it discusses specific settings
where such greedy policies attain the information theoretic lower bound of
logarithmic regret. To establish the results, recent advances on
self-normalized martingales together with a novel method of policy
decomposition are leveraged
A Tour of Reinforcement Learning: The View from Continuous Control
This manuscript surveys reinforcement learning from the perspective of
optimization and control with a focus on continuous control applications. It
surveys the general formulation, terminology, and typical experimental
implementations of reinforcement learning and reviews competing solution
paradigms. In order to compare the relative merits of various techniques, this
survey presents a case study of the Linear Quadratic Regulator (LQR) with
unknown dynamics, perhaps the simplest and best-studied problem in optimal
control. The manuscript describes how merging techniques from learning theory
and control can provide non-asymptotic characterizations of LQR performance and
shows that these characterizations tend to match experimental behavior. In
turn, when revisiting more complex applications, many of the observed phenomena
in LQR persist. In particular, theory and experiment demonstrate the role and
importance of models and the cost of generality in reinforcement learning
algorithms. This survey concludes with a discussion of some of the challenges
in designing learning systems that safely and reliably interact with complex
and uncertain environments and how tools from reinforcement learning and
control might be combined to approach these challenges.Comment: minor revision with a few clarifying passages and corrected typo
On Adaptive Linear-Quadratic Regulators
Performance of adaptive control policies is assessed through the regret with
respect to the optimal regulator, which reflects the increase in the operating
cost due to uncertainty about the dynamics parameters. However, available
results in the literature do not provide a quantitative characterization of the
effect of the unknown parameters on the regret. Further, there are problems
regarding the efficient implementation of some of the existing adaptive
policies. Finally, results regarding the accuracy with which the system's
parameters are identified are scarce and rather incomplete.
This study aims to comprehensively address these three issues. First, by
introducing a novel decomposition of adaptive policies, we establish a sharp
expression for the regret of an arbitrary policy in terms of the deviations
from the optimal regulator. Second, we show that adaptive policies based on
slight modifications of the Certainty Equivalence scheme are efficient.
Specifically, we establish a regret of (nearly) square-root rate for two
families of randomized adaptive policies. The presented regret bounds are
obtained by using anti-concentration results on the random matrices employed
for randomizing the estimates of the unknown parameters. Moreover, we study the
minimal additional information on dynamics matrices that using them the regret
will become of logarithmic order. Finally, the rates at which the unknown
parameters of the system are being identified are presented
Value and Policy Iteration in Optimal Control and Adaptive Dynamic Programming
In this paper, we consider discrete-time infinite horizon problems of optimal
control to a terminal set of states. These are the problems that are often
taken as the starting point for adaptive dynamic programming. Under very
general assumptions, we establish the uniqueness of solution of Bellman's
equation, and we provide convergence results for value and policy iteration
Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator
We study the sample complexity of approximate policy iteration (PI) for the
Linear Quadratic Regulator (LQR), building on a recent line of work using LQR
as a testbed to understand the limits of reinforcement learning (RL) algorithms
on continuous control tasks. Our analysis quantifies the tension between policy
improvement and policy evaluation, and suggests that policy evaluation is the
dominant factor in terms of sample complexity. Specifically, we show that to
obtain a controller that is within of the optimal LQR controller,
each step of policy evaluation requires at most
samples, where is the dimension of the state vector and is the
dimension of the input vector. On the other hand, only
policy improvement steps suffice, resulting in an overall sample complexity of
. We furthermore build on our
analysis and construct a simple adaptive procedure based on
-greedy exploration which relies on approximate PI as a
sub-routine and obtains regret, improving upon a recent result of
Abbasi-Yadkori et al
Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator
We consider adaptive control of the Linear Quadratic Regulator (LQR), where
an unknown linear system is controlled subject to quadratic costs. Leveraging
recent developments in the estimation of linear systems and in robust
controller synthesis, we present the first provably polynomial time algorithm
that provides high probability guarantees of sub-linear regret on this problem.
We further study the interplay between regret minimization and parameter
estimation by proving a lower bound on the expected regret in terms of the
exploration schedule used by any algorithm. Finally, we conduct a numerical
study comparing our robust adaptive algorithm to other methods from the
adaptive LQR literature, and demonstrate the flexibility of our proposed method
by extending it to a demand forecasting problem subject to state constraints
Convexity and monotonicity in nonlinear optimal control under uncertainty
We consider the problem of finite-horizon optimal control design under
uncertainty for imperfectly observed discrete-time systems with convex costs
and constraints. It is known that this problem can be cast as an
infinite-dimensional convex program when the dynamics and measurements are
linear, uncertainty is additive, and the risks associated with constraint
violations and excessive costs are measured in expectation or in the worst
case. In this paper, we extend this result to systems with convex or concave
dynamics, nonlinear measurements, more general uncertainty structures and other
coherent risk measures. In this setting, the optimal control problem can be
cast as an infinite-dimensional convex program if (1) the costs, constraints
and dynamics satisfy certain monotonicity properties, and (2) the measured
outputs can be reversibly `purified' of the influence of the control inputs
through Q- or Youla-parameterization. The practical value of this result is
that the finite-dimensional subproblems arising in a variety of suboptimal
control methods, notably including model predictive control and the Q-design
procedure, are also convex for this class of nonlinear systems. Subproblems can
therefore be solved to global optimality using convenient modeling software and
efficient, reliable solvers. We illustrate these ideas in a numerical example
Online Linear Quadratic Control
We study the problem of controlling linear time-invariant systems with known
noisy dynamics and adversarially chosen quadratic losses. We present the first
efficient online learning algorithms in this setting that guarantee
regret under mild assumptions, where is the time horizon. Our
algorithms rely on a novel SDP relaxation for the steady-state distribution of
the system. Crucially, and in contrast to previously proposed relaxations, the
feasible solutions of our SDP all correspond to "strongly stable" policies that
mix exponentially fast to a steady state
Adaptive Execution: Exploration and Learning of Price Impact
We consider a model in which a trader aims to maximize expected risk-adjusted
profit while trading a single security. In our model, each price change is a
linear combination of observed factors, impact resulting from the trader's
current and prior activity, and unpredictable random effects. The trader must
learn coefficients of a price impact model while trading. We propose a new
method for simultaneous execution and learning - the confidence-triggered
regularized adaptive certainty equivalent (CTRACE) policy - and establish a
poly-logarithmic finite-time expected regret bound. This bound implies that
CTRACE is efficient in the sense that the ({\epsilon},{\delta})-convergence
time is bounded by a polynomial function of 1/{\epsilon} and log(1/{\delta})
with high probability. In addition, we demonstrate via Monte Carlo simulation
that CTRACE outperforms the certainty equivalent policy and a recently proposed
reinforcement learning algorithm that is designed to explore efficiently in
linear-quadratic control problems.Comment: 31 pages, 4 figure
- …