47,159 research outputs found
Relative entropy via non-sequential recursive pair substitutions
The entropy of an ergodic source is the limit of properly rescaled 1-block
entropies of sources obtained applying successive non-sequential recursive
pairs substitutions (see P. Grassberger 2002 ArXiv:physics/0207023 and D.
Benedetto, E. Caglioti and D. Gabrielli 2006 Jour. Stat. Mech. Theo. Exp. 09
doi:10.1088/1742.-5468/2006/09/P09011). In this paper we prove that the cross
entropy and the Kullback-Leibler divergence can be obtained in a similar way.Comment: 13 pages , 2 figure
Qualitative Analysis of Concurrent Mean-payoff Games
We consider concurrent games played by two-players on a finite-state graph,
where in every round the players simultaneously choose a move, and the current
state along with the joint moves determine the successor state. We study a
fundamental objective, namely, mean-payoff objective, where a reward is
associated to each transition, and the goal of player 1 is to maximize the
long-run average of the rewards, and the objective of player 2 is strictly the
opposite. The path constraint for player 1 could be qualitative, i.e., the
mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative,
i.e., a given threshold between the minimal and maximal reward. We consider the
computation of the almost-sure (resp. positive) winning sets, where player 1
can ensure that the path constraint is satisfied with probability 1 (resp.
positive probability). Our main results for qualitative path constraints are as
follows: (1) we establish qualitative determinacy results that show that for
every state either player 1 has a strategy to ensure almost-sure (resp.
positive) winning against all player-2 strategies, or player 2 has a spoiling
strategy to falsify almost-sure (resp. positive) winning against all player-1
strategies; (2) we present optimal strategy complexity results that precisely
characterize the classes of strategies required for almost-sure and positive
winning for both players; and (3) we present quadratic time algorithms to
compute the almost-sure and the positive winning sets, matching the best known
bound of algorithms for much simpler problems (such as reachability
objectives). For quantitative constraints we show that a polynomial time
solution for the almost-sure or the positive winning set would imply a solution
to a long-standing open problem (the value problem for turn-based deterministic
mean-payoff games) that is not known to be solvable in polynomial time
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
- …