1,109 research outputs found
Mean-Variance Optimization in Markov Decision Processes
We consider finite horizon Markov decision processes under performance
measures that involve both the mean and the variance of the cumulative reward.
We show that either randomized or history-based policies can improve
performance. We prove that the complexity of computing a policy that maximizes
the mean reward under a variance constraint is NP-hard for some cases, and
strongly NP-hard for others. We finally offer pseudopolynomial exact and
approximation algorithms.Comment: A full version of an ICML 2011 pape
Guaranteed robustness properties of multivariable, nonlinear, stochastic optimal regulators
The robustness of optimal regulators for nonlinear, deterministic and stochastic, multi-input dynamical systems is studied under the assumption that all state variables can be measured. It is shown that, under mild assumptions, such nonlinear regulators have a guaranteed infinite gain margin; moreover, they have a guaranteed 50 percent gain reduction margin and a 60 degree phase margin, in each feedback channel, provided that the system is linear in the control and the penalty to the control is quadratic, thus extending the well-known properties of LQ regulators to nonlinear optimal designs. These results are also valid for infinite horizon, average cost, stochastic optimal control problems
On Learning with Finite Memory
We consider an infinite collection of agents who make decisions,
sequentially, about an unknown underlying binary state of the world. Each
agent, prior to making a decision, receives an independent private signal whose
distribution depends on the state of the world. Moreover, each agent also
observes the decisions of its last K immediate predecessors. We study
conditions under which the agent decisions converge to the correct value of the
underlying state. We focus on the case where the private signals have bounded
information content and investigate whether learning is possible, that is,
whether there exist decision rules for the different agents that result in the
convergence of their sequence of individual decisions to the correct state of
the world. We first consider learning in the almost sure sense and show that it
is impossible, for any value of K. We then explore the possibility of
convergence in probability of the decisions to the correct state. Here, a
distinction arises: if K equals 1, learning in probability is impossible under
any decision rule, while for K greater or equal to 2, we design a decision rule
that achieves it. We finally consider a new model, involving forward looking
strategic agents, each of which maximizes the discounted sum (over all agents)
of the probabilities of a correct decision. (The case, studied in previous
literature, of myopic agents who maximize the probability of their own decision
being correct is an extreme special case.) We show that for any value of K, for
any equilibrium of the associated Bayesian game, and under the assumption that
each private signal has bounded information content, learning in probability
fails to obtain
When is a network epidemic hard to eliminate?
We consider the propagation of a contagion process (epidemic) on a network
and study the problem of dynamically allocating a fixed curing budget to the
nodes of the graph, at each time instant. For bounded degree graphs, we provide
a lower bound on the expected time to extinction under any such dynamic
allocation policy, in terms of a combinatorial quantity that we call the
resistance of the set of initially infected nodes, the available budget, and
the number of nodes n. Specifically, we consider the case of bounded degree
graphs, with the resistance growing linearly in n. We show that if the curing
budget is less than a certain multiple of the resistance, then the expected
time to extinction grows exponentially with n. As a corollary, if all nodes are
initially infected and the CutWidth of the graph grows linearly, while the
curing budget is less than a certain multiple of the CutWidth, then the
expected time to extinction grows exponentially in n. The combination of the
latter with our prior work establishes a fairly sharp phase transition on the
expected time to extinction (sub-linear versus exponential) based on the
relation between the CutWidth and the curing budget
Qualitative Properties of alpha-Weighted Scheduling Policies
We consider a switched network, a fairly general constrained queueing network
model that has been used successfully to model the detailed packet-level
dynamics in communication networks, such as input-queued switches and wireless
networks. The main operational issue in this model is that of deciding which
queues to serve, subject to certain constraints. In this paper, we study
qualitative performance properties of the well known -weighted
scheduling policies. The stability, in the sense of positive recurrence, of
these policies has been well understood. We establish exponential upper bounds
on the tail of the steady-state distribution of the backlog. Along the way, we
prove finiteness of the expected steady-state backlog when , a
property that was known only for . Finally, we analyze the
excursions of the maximum backlog over a finite time horizon for . As a consequence, for , we establish the full state space
collapse property.Comment: 13 page
- …