4,673 research outputs found
Discounted continuous-time constrained Markov decision processes in Polish spaces
This paper is devoted to studying constrained continuous-time Markov decision
processes (MDPs) in the class of randomized policies depending on state
histories. The transition rates may be unbounded, the reward and costs are
admitted to be unbounded from above and from below, and the state and action
spaces are Polish spaces. The optimality criterion to be maximized is the
expected discounted rewards, and the constraints can be imposed on the expected
discounted costs. First, we give conditions for the nonexplosion of underlying
processes and the finiteness of the expected discounted rewards/costs. Second,
using a technique of occupation measures, we prove that the constrained
optimality of continuous-time MDPs can be transformed to an equivalent
(optimality) problem over a class of probability measures. Based on the
equivalent problem and a so-called -weak convergence of probability
measures developed in this paper, we show the existence of a constrained
optimal policy. Third, by providing a linear programming formulation of the
equivalent problem, we show the solvability of constrained optimal policies.
Finally, we use two computable examples to illustrate our main results.Comment: Published in at http://dx.doi.org/10.1214/10-AAP749 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Update or Wait: How to Keep Your Data Fresh
In this work, we study how to optimally manage the freshness of information
updates sent from a source node to a destination via a channel. A proper metric
for data freshness at the destination is the age-of-information, or simply age,
which is defined as how old the freshest received update is since the moment
that this update was generated at the source node (e.g., a sensor). A
reasonable update policy is the zero-wait policy, i.e., the source node submits
a fresh update once the previous update is delivered and the channel becomes
free, which achieves the maximum throughput and the minimum delay.
Surprisingly, this zero-wait policy does not always minimize the age. This
counter-intuitive phenomenon motivates us to study how to optimally control
information updates to keep the data fresh and to understand when the zero-wait
policy is optimal. We introduce a general age penalty function to characterize
the level of dissatisfaction on data staleness and formulate the average age
penalty minimization problem as a constrained semi-Markov decision problem
(SMDP) with an uncountable state space. We develop efficient algorithms to find
the optimal update policy among all causal policies, and establish sufficient
and necessary conditions for the optimality of the zero-wait policy. Our
investigation shows that the zero-wait policy is far from the optimum if (i)
the age penalty function grows quickly with respect to the age, (ii) the packet
transmission times over the channel are positively correlated over time, or
(iii) the packet transmission times are highly random (e.g., following a
heavy-tail distribution)
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Many problems in sequential decision making and stochastic control often have
natural multiscale structure: sub-tasks are assembled together to accomplish
complex goals. Systematically inferring and leveraging hierarchical structure,
particularly beyond a single level of abstraction, has remained a longstanding
challenge. We describe a fast multiscale procedure for repeatedly compressing,
or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of
sub-problems at different scales is automatically determined. Coarsened MDPs
are themselves independent, deterministic MDPs, and may be solved using
existing algorithms. The multiscale representation delivered by this procedure
decouples sub-tasks from each other and can lead to substantial improvements in
convergence rates both locally within sub-problems and globally across
sub-problems, yielding significant computational savings. A second fundamental
aspect of this work is that these multiscale decompositions yield new transfer
opportunities across different problems, where solutions of sub-tasks at
different levels of the hierarchy may be amenable to transfer to new problems.
Localized transfer of policies and potential operators at arbitrary scales is
emphasized. Finally, we demonstrate compression and transfer in a collection of
illustrative domains, including examples involving discrete and continuous
statespaces.Comment: 86 pages, 15 figure
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
- …