16,542 research outputs found
Deeper model endgame analysis
A reference model of Fallible Endgame Play has been implemented and exercised with the chess-engine WILHELM. Past experiments have demonstrated the value of the model and the robustness of decisions based on it: experiments agree well with a Markov Model theory. Here, the reference model is exercised on the well-known endgame KBBKN
Aversion to ambiguity and model misspecification in dynamic stochastic environments
Preferences that accommodate aversion to subjective uncertainty and its potential misspecification in dynamic settings are a valuable tool of analysis in many disciplines. By generalizing previous analyses, we propose a tractable approach to incorporating broadly conceived responses to uncertainty. We illustrate our approach on some stylized stochastic environments. By design, these discrete time environments have revealing continuous time limits. Drawing on these illustrations, we construct recursive representations of intertemporal preferences that allow for penalized and smooth ambiguity aversion to subjective uncertainty. These recursive representations imply continuous time limiting Hamilton–Jacobi–Bellman equations for solving control problems in the presence of uncertainty.Published versio
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Reducing Dueling Bandits to Cardinal Bandits
We present algorithms for reducing the Dueling Bandits problem to the
conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits
problem is an online model of learning with ordinal feedback of the form "A is
preferred to B" (as opposed to cardinal feedback like "A has value 2.5"),
giving it wide applicability in learning from implicit user feedback and
revealed and stated preferences. In contrast to existing algorithms for the
Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and
\DoubleSbm -- provide a generic schema for translating the extensive body of
known results about conventional Multi-Armed Bandit algorithms to the Dueling
Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in
both finite and infinite settings, and conjecture about the performance of
\DoubleSbm which empirically outperforms the other two as well as previous
algorithms in our experiments. In addition, we provide the first almost optimal
regret bound in terms of second order terms, such as the differences between
the values of the arms
Transitions between homophilic and heterophilic modes of cooperation
Cooperation is ubiquitous in biological and social systems. Previous studies
revealed that a preference toward similar appearance promotes cooperation, a
phenomenon called tag-mediated cooperation or communitarian cooperation. This
effect is enhanced when a spatial structure is incorporated, because space
allows agents sharing an identical tag to regroup to form locally cooperative
clusters. In spatially distributed settings, one can also consider migration of
organisms, which has a potential to further promote evolution of cooperation by
facilitating spatial clustering. However, it has not yet been considered in
spatial tag-mediated cooperation models. Here we show, using computer
simulations of a spatial model of evolutionary games with organismal migration,
that tag-based segregation and homophilic cooperation arise for a wide range of
parameters. In the meantime, our results also show another evolutionarily
stable outcome, where a high level of heterophilic cooperation is maintained in
spatially well-mixed patterns. We found that these two different forms of
tag-mediated cooperation appear alternately as the parameter for temptation to
defect is increased.Comment: 16 pages, 7 figure
- …