10 research outputs found
Structural Results and Explicit Solution for Two-Player LQG Systems on a Finite Time Horizon
It is well-known that linear dynamical systems with Gaussian noise and
quadratic cost (LQG) satisfy a separation principle. Finding the optimal
controller amounts to solving separate dual problems; one for control and one
for estimation. For the discrete-time finite-horizon case, each problem is a
simple forward or backward recursion. In this paper, we consider a
generalization of the LQG problem in which there are two controllers. Each
controller is responsible for one of two system inputs, but has access to
different subsets of the available measurements. Our paper has three main
contributions. First, we prove a fundamental structural result: sufficient
statistics for the controllers can be expressed as conditional means of the
global state. Second, we give explicit state-space formulae for the optimal
controller. These formulae are reminiscent of the classical LQG solution with
dual forward and backward recursions, but with the important difference that
they are intricately coupled. Lastly, we show how these recursions can be
solved efficiently, with computational complexity comparable to that of the
centralized problem
Optimal Local and Remote Controllers with Unreliable Communication
We consider a decentralized optimal control problem for a linear plant
controlled by two controllers, a local controller and a remote controller. The
local controller directly observes the state of the plant and can inform the
remote controller of the plant state through a packet-drop channel. We assume
that the remote controller is able to send acknowledgments to the local
controller to signal the successful receipt of transmitted packets. The
objective of the two controllers is to cooperatively minimize a quadratic
performance cost. We provide a dynamic program for this decentralized control
problem using the common information approach. Although our problem is not a
partially nested LQG problem, we obtain explicit optimal strategies for the two
controllers. In the optimal strategies, both controllers compute a common
estimate of the plant state based on the common information. The remote
controller's action is linear in the common estimated state, and the local
controller's action is linear in both the actual state and the common estimated
state
Solving Common-Payoff Games with Approximate Policy Iteration
For artificially intelligent learning systems to have widespread
applicability in real-world settings, it is important that they be able to
operate decentrally. Unfortunately, decentralized control is difficult --
computing even an epsilon-optimal joint policy is a NEXP complete problem.
Nevertheless, a recently rediscovered insight -- that a team of agents can
coordinate via common knowledge -- has given rise to algorithms capable of
finding optimal joint policies in small common-payoff games. The Bayesian
action decoder (BAD) leverages this insight and deep reinforcement learning to
scale to games as large as two-player Hanabi. However, the approximations it
uses to do so prevent it from discovering optimal joint policies even in games
small enough to brute force optimal solutions. This work proposes CAPI, a novel
algorithm which, like BAD, combines common knowledge with deep reinforcement
learning. However, unlike BAD, CAPI prioritizes the propensity to discover
optimal joint policies over scalability. While this choice precludes CAPI from
scaling to games as large as Hanabi, empirical results demonstrate that, on the
games to which CAPI does scale, it is capable of discovering optimal joint
policies even when other modern multi-agent reinforcement learning algorithms
are unable to do so. Code is available at https://github.com/ssokota/capi .Comment: AAAI 202
Optimal Control for LQG Systems on Graphs---Part I: Structural Results
In this two-part paper, we identify a broad class of decentralized
output-feedback LQG systems for which the optimal control strategies have a
simple intuitive estimation structure and can be computed efficiently. Roughly,
we consider the class of systems for which the coupling of dynamics among
subsystems and the inter-controller communication is characterized by the same
directed graph. Furthermore, this graph is assumed to be a multitree, that is,
its transitive reduction can have at most one directed path connecting each
pair of nodes. In this first part, we derive sufficient statistics that may be
used to aggregate each controller's growing available information. Each
controller must estimate the states of the subsystems that it affects (its
descendants) as well as the subsystems that it observes (its ancestors). The
optimal control action for a controller is a linear function of the estimate it
computes as well as the estimates computed by all of its ancestors. Moreover,
these state estimates may be updated recursively, much like a Kalman filter
Decentralized linear quadratic systems with major and minor agents and non-Gaussian noise
We consider a decentralized linear quadratic system with a major agent and a
collection of minor agents. The agents are coupled in their dynamics as well as
a quadratic cost. In particular, the dynamics are linear; the state and control
action of the major agent affect the state evolution of all the minor agents
but the state and the control action of the minor agents do not affect the
state evolution of the major or other minor agents. The system has partial
output feedback with partially nested information structure. In particular, the
major agent perfectly observes its own state while each minor agent perfectly
observes the state of the major agent and partially observes its own state. It
is not assumed that the noise process has a Gaussian distribution. For this
model, we characterize the structure of the optimal and the best linear
strategies. We show that the optimal control of the major agent is a linear
function of the major agent's MMSE (minimum mean squared error) estimate of the
system state and the optimal control of a minor agent is a linear function of
the major agent's MMSE estimate of the system state and a "correction term"
which depends on the difference of the minor agent's MMSE estimate of its local
state and the major agent's MMSE estimate of the minor agent's local state. The
major agent's MMSE estimate is a linear function of its observations while the
minor agent's MMSE estimate is a non-linear function of its observations which
is updated according to the non-linear Bayesian filter. We show that if we
replace the minor agent's MMSE estimate by its LLMS (linear least mean square)
estimate, then the resultant strategy is the best linear control strategy. We
prove the result using a direct proof which is based on conditional
independence, splitting of the state and control actions, simplifying the
per-step cost, orthogonality principle, and completion of squares.Comment: 15 pages, submitted to the IEEE Transactions on Automatic Contro