10 research outputs found

    Structural Results and Explicit Solution for Two-Player LQG Systems on a Finite Time Horizon

    Full text link
    It is well-known that linear dynamical systems with Gaussian noise and quadratic cost (LQG) satisfy a separation principle. Finding the optimal controller amounts to solving separate dual problems; one for control and one for estimation. For the discrete-time finite-horizon case, each problem is a simple forward or backward recursion. In this paper, we consider a generalization of the LQG problem in which there are two controllers. Each controller is responsible for one of two system inputs, but has access to different subsets of the available measurements. Our paper has three main contributions. First, we prove a fundamental structural result: sufficient statistics for the controllers can be expressed as conditional means of the global state. Second, we give explicit state-space formulae for the optimal controller. These formulae are reminiscent of the classical LQG solution with dual forward and backward recursions, but with the important difference that they are intricately coupled. Lastly, we show how these recursions can be solved efficiently, with computational complexity comparable to that of the centralized problem

    Optimal Local and Remote Controllers with Unreliable Communication

    Full text link
    We consider a decentralized optimal control problem for a linear plant controlled by two controllers, a local controller and a remote controller. The local controller directly observes the state of the plant and can inform the remote controller of the plant state through a packet-drop channel. We assume that the remote controller is able to send acknowledgments to the local controller to signal the successful receipt of transmitted packets. The objective of the two controllers is to cooperatively minimize a quadratic performance cost. We provide a dynamic program for this decentralized control problem using the common information approach. Although our problem is not a partially nested LQG problem, we obtain explicit optimal strategies for the two controllers. In the optimal strategies, both controllers compute a common estimate of the plant state based on the common information. The remote controller's action is linear in the common estimated state, and the local controller's action is linear in both the actual state and the common estimated state

    Solving Common-Payoff Games with Approximate Policy Iteration

    Full text link
    For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep reinforcement learning. However, unlike BAD, CAPI prioritizes the propensity to discover optimal joint policies over scalability. While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so. Code is available at https://github.com/ssokota/capi .Comment: AAAI 202

    Optimal Control for LQG Systems on Graphs---Part I: Structural Results

    Full text link
    In this two-part paper, we identify a broad class of decentralized output-feedback LQG systems for which the optimal control strategies have a simple intuitive estimation structure and can be computed efficiently. Roughly, we consider the class of systems for which the coupling of dynamics among subsystems and the inter-controller communication is characterized by the same directed graph. Furthermore, this graph is assumed to be a multitree, that is, its transitive reduction can have at most one directed path connecting each pair of nodes. In this first part, we derive sufficient statistics that may be used to aggregate each controller's growing available information. Each controller must estimate the states of the subsystems that it affects (its descendants) as well as the subsystems that it observes (its ancestors). The optimal control action for a controller is a linear function of the estimate it computes as well as the estimates computed by all of its ancestors. Moreover, these state estimates may be updated recursively, much like a Kalman filter

    Decentralized linear quadratic systems with major and minor agents and non-Gaussian noise

    Full text link
    We consider a decentralized linear quadratic system with a major agent and a collection of minor agents. The agents are coupled in their dynamics as well as a quadratic cost. In particular, the dynamics are linear; the state and control action of the major agent affect the state evolution of all the minor agents but the state and the control action of the minor agents do not affect the state evolution of the major or other minor agents. The system has partial output feedback with partially nested information structure. In particular, the major agent perfectly observes its own state while each minor agent perfectly observes the state of the major agent and partially observes its own state. It is not assumed that the noise process has a Gaussian distribution. For this model, we characterize the structure of the optimal and the best linear strategies. We show that the optimal control of the major agent is a linear function of the major agent's MMSE (minimum mean squared error) estimate of the system state and the optimal control of a minor agent is a linear function of the major agent's MMSE estimate of the system state and a "correction term" which depends on the difference of the minor agent's MMSE estimate of its local state and the major agent's MMSE estimate of the minor agent's local state. The major agent's MMSE estimate is a linear function of its observations while the minor agent's MMSE estimate is a non-linear function of its observations which is updated according to the non-linear Bayesian filter. We show that if we replace the minor agent's MMSE estimate by its LLMS (linear least mean square) estimate, then the resultant strategy is the best linear control strategy. We prove the result using a direct proof which is based on conditional independence, splitting of the state and control actions, simplifying the per-step cost, orthogonality principle, and completion of squares.Comment: 15 pages, submitted to the IEEE Transactions on Automatic Contro
    corecore