1,395 research outputs found
One-Counter Stochastic Games
We study the computational complexity of basic decision problems for
one-counter simple stochastic games (OC-SSGs), under various objectives.
OC-SSGs are 2-player turn-based stochastic games played on the transition graph
of classic one-counter automata. We study primarily the termination objective,
where the goal of one player is to maximize the probability of reaching counter
value 0, while the other player wishes to avoid this. Partly motivated by the
goal of understanding termination objectives, we also study certain "limit" and
"long run average" reward objectives that are closely related to some
well-studied objectives for stochastic games with rewards. Examples of problems
we address include: does player 1 have a strategy to ensure that the counter
eventually hits 0, i.e., terminates, almost surely, regardless of what player 2
does? Or that the liminf (or limsup) counter value equals infinity with a
desired probability? Or that the long run average reward is >0 with desired
probability? We show that the qualitative termination problem for OC-SSGs is in
NP intersection coNP, and is in P-time for 1-player OC-SSGs, or equivalently
for one-counter Markov Decision Processes (OC-MDPs). Moreover, we show that
quantitative limit problems for OC-SSGs are in NP intersection coNP, and are in
P-time for 1-player OC-MDPs. Both qualitative limit problems and qualitative
termination problems for OC-SSGs are already at least as hard as Condon's
quantitative decision problem for finite-state SSGs.Comment: 20 pages, 1 figure. This is a full version of a paper accepted for
publication in proceedings of FSTTCS 201
Generalized Off-Policy Actor-Critic
We propose a new objective, the counterfactual objective, unifying existing
objectives for off-policy policy gradient algorithms in the continuing
reinforcement learning (RL) setting. Compared to the commonly used excursion
objective, which can be misleading about the performance of the target policy
when deployed, our new objective better predicts such performance. We prove the
Generalized Off-Policy Policy Gradient Theorem to compute the policy gradient
of the counterfactual objective and use an emphatic approach to get an unbiased
sample from this policy gradient, yielding the Generalized Off-Policy
Actor-Critic (Geoff-PAC) algorithm. We demonstrate the merits of Geoff-PAC over
existing algorithms in Mujoco robot simulation tasks, the first empirical
success of emphatic algorithms in prevailing deep RL benchmarks.Comment: NeurIPS 201
Ergodic Control and Polyhedral approaches to PageRank Optimization
We study a general class of PageRank optimization problems which consist in
finding an optimal outlink strategy for a web site subject to design
constraints. We consider both a continuous problem, in which one can choose the
intensity of a link, and a discrete one, in which in each page, there are
obligatory links, facultative links and forbidden links. We show that the
continuous problem, as well as its discrete variant when there are no
constraints coupling different pages, can both be modeled by constrained Markov
decision processes with ergodic reward, in which the webmaster determines the
transition probabilities of websurfers. Although the number of actions turns
out to be exponential, we show that an associated polytope of transition
measures has a concise representation, from which we deduce that the continuous
problem is solvable in polynomial time, and that the same is true for the
discrete problem when there are no coupling constraints. We also provide
efficient algorithms, adapted to very large networks. Then, we investigate the
qualitative features of optimal outlink strategies, and identify in particular
assumptions under which there exists a "master" page to which all controlled
pages should point. We report numerical results on fragments of the real web
graph.Comment: 39 page
- …