Search CORE

1,109 research outputs found

Mean-Variance Optimization in Markov Decision Processes

Author: Mannor Shie
Tsitsiklis John
Publication venue
Publication date: 29/04/2011
Field of study

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.Comment: A full version of an ICML 2011 pape

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Guaranteed robustness properties of multivariable, nonlinear, stochastic optimal regulators

Author: Athans M.
Tsitsiklis J. N.
Publication venue
Publication date: 01/01/1983
Field of study

The robustness of optimal regulators for nonlinear, deterministic and stochastic, multi-input dynamical systems is studied under the assumption that all state variables can be measured. It is shown that, under mild assumptions, such nonlinear regulators have a guaranteed infinite gain margin; moreover, they have a guaranteed 50 percent gain reduction margin and a 60 degree phase margin, in each feedback channel, provided that the system is linear in the control and the penalty to the control is quadratic, thus extending the well-known properties of LQ regulators to nonlinear optimal designs. These results are also valid for infinite horizon, average cost, stochastic optimal control problems

Crossref

NASA Technical Reports Server

On Learning with Finite Memory

Author: Drakopoulos Kimon
Ozdaglar Asuman
Tsitsiklis John
Publication venue
Publication date: 05/09/2012
Field of study

We consider an infinite collection of agents who make decisions, sequentially, about an unknown underlying binary state of the world. Each agent, prior to making a decision, receives an independent private signal whose distribution depends on the state of the world. Moreover, each agent also observes the decisions of its last K immediate predecessors. We study conditions under which the agent decisions converge to the correct value of the underlying state. We focus on the case where the private signals have bounded information content and investigate whether learning is possible, that is, whether there exist decision rules for the different agents that result in the convergence of their sequence of individual decisions to the correct state of the world. We first consider learning in the almost sure sense and show that it is impossible, for any value of K. We then explore the possibility of convergence in probability of the decisions to the correct state. Here, a distinction arises: if K equals 1, learning in probability is impossible under any decision rule, while for K greater or equal to 2, we design a decision rule that achieves it. We finally consider a new model, involving forward looking strategic agents, each of which maximizes the discounted sum (over all agents) of the probabilities of a correct decision. (The case, studied in previous literature, of myopic agents who maximize the probability of their own decision being correct is an extreme special case.) We show that for any value of K, for any equilibrium of the associated Bayesian game, and under the assumption that each private signal has bounded information content, learning in probability fails to obtain

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

When is a network epidemic hard to eliminate?

Author: Drakopoulos Kimon
Ozdaglar Asuman
Tsitsiklis John N.
Publication venue
Publication date: 01/09/2015
Field of study

We consider the propagation of a contagion process (epidemic) on a network and study the problem of dynamically allocating a fixed curing budget to the nodes of the graph, at each time instant. For bounded degree graphs, we provide a lower bound on the expected time to extinction under any such dynamic allocation policy, in terms of a combinatorial quantity that we call the resistance of the set of initially infected nodes, the available budget, and the number of nodes n. Specifically, we consider the case of bounded degree graphs, with the resistance growing linearly in n. We show that if the curing budget is less than a certain multiple of the resistance, then the expected time to extinction grows exponentially with n. As a corollary, if all nodes are initially infected and the CutWidth of the graph grows linearly, while the curing budget is less than a certain multiple of the CutWidth, then the expected time to extinction grows exponentially in n. The combination of the latter with our prior work establishes a fairly sharp phase transition on the expected time to extinction (sub-linear versus exponential) based on the relation between the CutWidth and the curing budget

arXiv.org e-Print Archive

DSpace@MIT

Qualitative Properties of alpha-Weighted Scheduling Policies

Author: Shah Devavrat
Tsitsiklis John N.
Zhong Yuan
Publication venue
Publication date: 01/01/2010
Field of study

We consider a switched network, a fairly general constrained queueing network model that has been used successfully to model the detailed packet-level dynamics in communication networks, such as input-queued switches and wireless networks. The main operational issue in this model is that of deciding which queues to serve, subject to certain constraints. In this paper, we study qualitative performance properties of the well known

\alpha

-weighted scheduling policies. The stability, in the sense of positive recurrence, of these policies has been well understood. We establish exponential upper bounds on the tail of the steady-state distribution of the backlog. Along the way, we prove finiteness of the expected steady-state backlog when

\alpha<1

, a property that was known only for

\alpha\geq 1

. Finally, we analyze the excursions of the maximum backlog over a finite time horizon for

\alpha \geq 1

. As a consequence, for

\alpha \geq 1

, we establish the full state space collapse property.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref