16,542 research outputs found

    Deeper model endgame analysis

    Get PDF
    A reference model of Fallible Endgame Play has been implemented and exercised with the chess-engine WILHELM. Past experiments have demonstrated the value of the model and the robustness of decisions based on it: experiments agree well with a Markov Model theory. Here, the reference model is exercised on the well-known endgame KBBKN

    Aversion to ambiguity and model misspecification in dynamic stochastic environments

    Get PDF
    Preferences that accommodate aversion to subjective uncertainty and its potential misspecification in dynamic settings are a valuable tool of analysis in many disciplines. By generalizing previous analyses, we propose a tractable approach to incorporating broadly conceived responses to uncertainty. We illustrate our approach on some stylized stochastic environments. By design, these discrete time environments have revealing continuous time limits. Drawing on these illustrations, we construct recursive representations of intertemporal preferences that allow for penalized and smooth ambiguity aversion to subjective uncertainty. These recursive representations imply continuous time limiting Hamilton–Jacobi–Bellman equations for solving control problems in the presence of uncertainty.Published versio

    Probabilistic inverse reinforcement learning in unknown environments

    Full text link
    We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Reducing Dueling Bandits to Cardinal Bandits

    Full text link
    We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and \DoubleSbm -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms

    Transitions between homophilic and heterophilic modes of cooperation

    Full text link
    Cooperation is ubiquitous in biological and social systems. Previous studies revealed that a preference toward similar appearance promotes cooperation, a phenomenon called tag-mediated cooperation or communitarian cooperation. This effect is enhanced when a spatial structure is incorporated, because space allows agents sharing an identical tag to regroup to form locally cooperative clusters. In spatially distributed settings, one can also consider migration of organisms, which has a potential to further promote evolution of cooperation by facilitating spatial clustering. However, it has not yet been considered in spatial tag-mediated cooperation models. Here we show, using computer simulations of a spatial model of evolutionary games with organismal migration, that tag-based segregation and homophilic cooperation arise for a wide range of parameters. In the meantime, our results also show another evolutionarily stable outcome, where a high level of heterophilic cooperation is maintained in spatially well-mixed patterns. We found that these two different forms of tag-mediated cooperation appear alternately as the parameter for temptation to defect is increased.Comment: 16 pages, 7 figure
    • …
    corecore