34,376 research outputs found
Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters
Markov decision processes (MDPs) are a popular model for performance analysis
and optimization of stochastic systems. The parameters of stochastic behavior
of MDPs are estimates from empirical observations of a system; their values are
not known precisely. Different types of MDPs with uncertain, imprecise or
bounded transition rates or probabilities and rewards exist in the literature.
Commonly, analysis of models with uncertainties amounts to searching for the
most robust policy which means that the goal is to generate a policy with the
greatest lower bound on performance (or, symmetrically, the lowest upper bound
on costs). However, hedging against an unlikely worst case may lead to losses
in other situations. In general, one is interested in policies that behave well
in all situations which results in a multi-objective view on decision making.
In this paper, we consider policies for the expected discounted reward
measure of MDPs with uncertain parameters. In particular, the approach is
defined for bounded-parameter MDPs (BMDPs) [8]. In this setting the worst, best
and average case performances of a policy are analyzed simultaneously, which
yields a multi-scenario multi-objective optimization problem. The paper
presents and evaluates approaches to compute the pure Pareto optimal policies
in the value vector space.Comment: 9 pages, 5 figures, preprint for VALUETOOLS 201
Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
We consider the off-policy evaluation problem in Markov decision processes
with function approximation. We propose a generalization of the recently
introduced \emph{emphatic temporal differences} (ETD) algorithm
\citep{SuttonMW15}, which encompasses the original ETD(), as well as
several other off-policy evaluation algorithms as special cases. We call this
framework \ETD, where our introduced parameter controls the decay rate
of an importance-sampling term. We study conditions under which the projected
fixed-point equation underlying \ETD\ involves a contraction operator, allowing
us to present the first asymptotic error bounds (bias) for \ETD. Our results
show that the original ETD algorithm always involves a contraction operator,
and its bias is bounded. Moreover, by controlling , our proposed
generalization allows trading-off bias for variance reduction, thereby
achieving a lower total error.Comment: arXiv admin note: text overlap with arXiv:1508.0341
Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
We consider Markov decision processes under parameter uncertainty. Previous
studies all restrict to the case that uncertainties among different states are
uncoupled, which leads to conservative solutions. In contrast, we introduce an
intuitive concept, termed "Lightning Does not Strike Twice," to model coupled
uncertain parameters. Specifically, we require that the system can deviate from
its nominal parameters only a bounded number of times. We give probabilistic
guarantees indicating that this model represents real life situations and
devise tractable algorithms for computing optimal control policies using this
concept.Comment: ICML201
Average optimality for continuous-time Markov decision processes in polish spaces
This paper is devoted to studying the average optimality in continuous-time
Markov decision processes with fairly general state and action spaces. The
criterion to be maximized is expected average rewards. The transition rates of
underlying continuous-time jump Markov processes are allowed to be unbounded,
and the reward rates may have neither upper nor lower bounds. We first provide
two optimality inequalities with opposed directions, and also give suitable
conditions under which the existence of solutions to the two optimality
inequalities is ensured. Then, from the two optimality inequalities we prove
the existence of optimal (deterministic) stationary policies by using the
Dynkin formula. Moreover, we present a ``semimartingale characterization'' of
an optimal stationary policy. Finally, we use a generalized Potlach process
with control to illustrate the difference between our conditions and those in
the previous literature, and then further apply our results to average optimal
control problems of generalized birth--death systems, upwardly skip-free
processes and two queueing systems. The approach developed in this paper is
slightly different from the ``optimality inequality approach'' widely used in
the previous literature.Comment: Published at http://dx.doi.org/10.1214/105051606000000105 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
Power Aware Wireless File Downloading: A Constrained Restless Bandit Approach
This paper treats power-aware throughput maximization in a multi-user file
downloading system. Each user can receive a new file only after its previous
file is finished. The file state processes for each user act as coupled Markov
chains that form a generalized restless bandit system. First, an optimal
algorithm is derived for the case of one user. The algorithm maximizes
throughput subject to an average power constraint. Next, the one-user algorithm
is extended to a low complexity heuristic for the multi-user problem. The
heuristic uses a simple online index policy and its effectiveness is shown via
simulation. For simple 3-user cases where the optimal solution can be computed
offline, the heuristic is shown to be near-optimal for a wide range of
parameters
Evolutionary game of coalition building under external pressure
We study the fragmentation-coagulation (or merging and splitting)
evolutionary control model as introduced recently by one of the authors, where
small players can form coalitions to resist to the pressure exerted by the
principal. It is a Markov chain in continuous time and the players have a
common reward to optimize. We study the behavior as grows and show that the
problem converges to a (one player) deterministic optimization problem in
continuous time, in the infinite dimensional state space
- …