34 research outputs found

    Near-Optimality of Finite-Memory Codes and Reinforcement Learning for Zero-Delay Coding of Markov Sources

    Full text link
    We study the problem of zero-delay coding of a Markov source over a noisy channel with feedback. We first formulate the problem as a Markov decision process (MDP) where the state is a previous belief term along with a finite memory of channel outputs and quantizers. We then approximate this state by marginalizing over all possible beliefs, so that our policies only use the finite-memory term to encode the source. Under an appropriate notion of predictor stability, we show that such policies are near-optimal for the zero-delay coding problem as the memory length increases. We also give sufficient conditions for predictor stability to hold, and propose a reinforcement learning algorithm to compute near-optimal finite-memory policies. These theoretical results are supported by simulations.Comment: Submitted to 2024 American Control Conferenc

    Sequential Decision Making in Decentralized Systems.

    Full text link
    We study sequential decision making problems in cooperative systems where different agents with different information want to achieve a common objective. The sequential nature of the decision problem implies that all decisions can be arranged in a sequence such that the information available to make the t-th decision only depends on preceding decisions. Markov decision theory provides tools for addressing sequential decision making problems with classical information structures. In this thesis, we introduce a new approach for decision making problems with non-classical information structures. This approach relies on the idea of common information between decision-makers. Intuitively, common information consists of past observations and decisions that are commonly known to the current and future decision makers. We show that a common information based approach can allow us to discover new structural results of optimal decision strategies and provide a sequential decomposition of the decision-making problems. We first demonstrate this approach on two specific instances of sequential problems, namely, a real-time multi-terminal communication system and a decentralized control system with delayed sharing of information. We then show that the common information methodology applies more generally to any sequential decision making problem. Moreover, we show that our common information methodology unifies the separate sequential decomposition results available for classical and non-classical information structures. We also present sufficient conditions for simplifying common information based sequential decompositions. This simplification relies on the concept of state sufficient for the input output map of a coordinator that only knows the common information.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89725/1/anayyar_1.pd

    On the Structure of Optimal Real-Time Encoders and Decoders in Noisy Communication

    Full text link

    Sequential Decomposition of Sequential Dynamic Teams: Applications to Real-Time Communication and Networked Control Systems.

    Full text link
    Optimal design of multi-agent sequential teams is investigated in this thesis. A systematic methodology is presented to convert the search for an optimal multistage design into a sequence of nested optimization problems, where at each step the best decision rule of a agent at a given time is search. This conversion is called sequential decomposition and it drastically simplifies the search of optimal solution for both finite and infinite horizon problems. The main idea is as follows. A state sufficient for input-output mapping of the system is identified. A joint probability measure on this state is an information state sufficient for performance evaluation. This information state evolves in time in a deterministic manner depending on the choice of decision rules of the agents. Thus, these information states are a controlled Markov process where the control actions are the decision rules of the agents. The optimal control of the time-evolution of these information states results in a sequential decomposition of the problem. Applications of this methodology to real-time communication and optimal feedback control over noisy communication channels is also investigated.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61717/1/adityam_1.pd

    Rate-cost tradeoffs in control

    Get PDF
    Consider a distributed control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is minimize a quadratic cost function. The most basic special case of that cost function is the mean-square deviation of the system state from the desired state. We study the fundamental tradeoff between the communication rate r bits/sec and the limsup of the expected cost b, and show a lower bound on the rate necessary to attain b. The bound applies as long as the system noise has a probability density function. If target cost b is not too large, that bound can be closely approached by a simple lattice quantization scheme that only quantizes the innovation, that is, the difference between the controller's belief about the current state and the true state

    Rate-Cost Tradeoffs in Control

    Get PDF
    Consider a control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is to minimize a quadratic cost function in the state variables and control signal, known as the linear quadratic regulator (LQR). We study the fundamental tradeoff between the communication rate r bits/sec and the expected cost b. We obtain a lower bound on a certain rate-cost function, which quantifies the minimum directed mutual information between the channel input and output that is compatible with a target LQR cost. The rate-cost function has operational significance in multiple scenarios of interest: among others, it allows us to lower-bound the minimum communication rate for fixed and variable length quantization, and for control over noisy channels. We derive an explicit lower bound to the rate-cost function, which applies to the vector, non-Gaussian, and partially observed systems, thereby extending and generalizing an earlier explicit expression for the scalar Gaussian system, due to Tatikonda el al. [2]. The bound applies as long as the differential entropy of the system noise is not −∞ . It can be closely approached by a simple lattice quantization scheme that only quantizes the innovation, that is, the difference between the controller's belief about the current state and the true state. Via a separation principle between control and communication, similar results hold for causal lossy compression of additive noise Markov sources. Apart from standard dynamic programming arguments, our technical approach leverages the Shannon lower bound, develops new estimates for data compression with coding memory, and uses some recent results on high resolution variablelength vector quantization to prove that the new converse bounds are tight

    The Power of Online Learning in Stochastic Network Optimization

    Get PDF
    In this paper, we investigate the power of online learning in stochastic network optimization with unknown system statistics {\it a priori}. We are interested in understanding how information and learning can be efficiently incorporated into system control techniques, and what are the fundamental benefits of doing so. We propose two \emph{Online Learning-Aided Control} techniques, OLAC\mathtt{OLAC} and OLAC2\mathtt{OLAC2}, that explicitly utilize the past system information in current system control via a learning procedure called \emph{dual learning}. We prove strong performance guarantees of the proposed algorithms: OLAC\mathtt{OLAC} and OLAC2\mathtt{OLAC2} achieve the near-optimal [O(ϵ),O([log(1/ϵ)]2)][O(\epsilon), O([\log(1/\epsilon)]^2)] utility-delay tradeoff and OLAC2\mathtt{OLAC2} possesses an O(ϵ2/3)O(\epsilon^{-2/3}) convergence time. OLAC\mathtt{OLAC} and OLAC2\mathtt{OLAC2} are probably the first algorithms that simultaneously possess explicit near-optimal delay guarantee and sub-linear convergence time. Simulation results also confirm the superior performance of the proposed algorithms in practice. To the best of our knowledge, our attempt is the first to explicitly incorporate online learning into stochastic network optimization and to demonstrate its power in both theory and practice

    Rate-cost tradeoffs in control

    Get PDF
    Consider a distributed control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is minimize a quadratic cost function. The most basic special case of that cost function is the mean-square deviation of the system state from the desired state. We study the fundamental tradeoff between the communication rate r bits/sec and the limsup of the expected cost b, and show a lower bound on the rate necessary to attain b. The bound applies as long as the system noise has a probability density function. If target cost b is not too large, that bound can be closely approached by a simple lattice quantization scheme that only quantizes the innovation, that is, the difference between the controller's belief about the current state and the true state
    corecore