Search CORE

16,542 research outputs found

Deeper model endgame analysis

Author: Andrist Rafael B
Haworth Guy McCrossan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

A reference model of Fallible Endgame Play has been implemented and exercised with the chess-engine WILHELM. Past experiments have demonstrated the value of the model and the robustness of decisions based on it: experiments agree well with a Markov Model theory. Here, the reference model is exercised on the well-known endgame KBBKN

Central Archive at the University of Reading

Elsevier - Publisher Connector

Aversion to ambiguity and model misspecification in dynamic stochastic environments

Author: Hansen Lars Peter
Miao Jianjun
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 28/08/2018
Field of study

Preferences that accommodate aversion to subjective uncertainty and its potential misspecification in dynamic settings are a valuable tool of analysis in many disciplines. By generalizing previous analyses, we propose a tractable approach to incorporating broadly conceived responses to uncertainty. We illustrate our approach on some stylized stochastic environments. By design, these discrete time environments have revealing continuous time limits. Drawing on these illustrations, we construct recursive representations of intertemporal preferences that allow for penalized and smooth ambiguity aversion to subjective uncertainty. These recursive representations imply continuous time limiting Hamilton–Jacobi–Bellman equations for solving control problems in the presence of uncertainty.Published versio

Boston University Institutional Repository (OpenBU)

Probabilistic inverse reinforcement learning in unknown environments

Author: Dimitrakakis Christos
Tossou Aristide
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

Reducing Dueling Bandits to Cardinal Bandits

Author: Ailon Nir
Joachims Thorsten
Karnin Zohar
Publication venue
Publication date: 14/05/2014
Field of study

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and \DoubleSbm -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms

arXiv.org e-Print Archive

CiteSeerX

Transitions between homophilic and heterophilic modes of cooperation

Author: Bersini Hugues
Ichinose Genki
Saito Masaya
Sayama Hiroki
Publication venue: 'Journal of Artificial Societies and Social Simulation'
Publication date: 01/10/2015
Field of study

Cooperation is ubiquitous in biological and social systems. Previous studies revealed that a preference toward similar appearance promotes cooperation, a phenomenon called tag-mediated cooperation or communitarian cooperation. This effect is enhanced when a spatial structure is incorporated, because space allows agents sharing an identical tag to regroup to form locally cooperative clusters. In spatially distributed settings, one can also consider migration of organisms, which has a potential to further promote evolution of cooperation by facilitating spatial clustering. However, it has not yet been considered in spatial tag-mediated cooperation models. Here we show, using computer simulations of a spatial model of evolutionary games with organismal migration, that tag-based segregation and homophilic cooperation arise for a wide range of parameters. In the meantime, our results also show another evolutionarily stable outcome, where a high level of heterophilic cooperation is maintained in spatially well-mixed patterns. We found that these two different forms of tag-mediated cooperation appear alternately as the parameter for temptation to defect is increased.Comment: 16 pages, 7 figure

arXiv.org e-Print Archive

The Open Repository @Binghamton (The ORB)

DI-fusion