Search CORE

11,029 research outputs found

From ranking to intranstivie preference learning : rock-paper-scissors and beyond

Author: De Baets Bernard
Pahikkala Tapio
Salakoski Tapio
Tsivtsivadze Evgeni
Waegeman Willem
Publication venue
Publication date: 01/01/2009
Field of study

Probabilistic inverse reinforcement learning in unknown environments

Author: Dimitrakakis Christos
Tossou Aristide
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

Playing Atari with Deep Reinforcement Learning

Author: Antonoglou Ioannis
Graves Alex
Kavukcuoglu Koray
Mnih Volodymyr
Riedmiller Martin
Silver David
Wierstra Daan
Publication venue
Publication date: 01/01/2013
Field of study

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.Comment: NIPS Deep Learning Workshop 201

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Evolutionary games on graphs

Author: Abramson
Ahmed
Aktipis
Albert
Alexander
Alonso-Sanz
Amaral
Antal
Antal
Arthur
Ashlock
Atman
Aumann
Axelrod
Axelrod
Axelrod
Axelrod
Axelrod
Axelrod
Axelrod
Bak
Bala
Ball
Barabási
Baumol
Ben-Naim
Ben-Naim
Benaim
Benzi
Berg
Berg
Bidaux
Biely
Binder
Binmore
Binmore
Binmore
Bishop
Blarer
Blume
Blume
Blume
Blume
Blume
Boccaletti
Boerlijst
Boerlijst
Bollobás
Bollobás
Bomze
Bomze
Bradley
Bramson
Brauchli
Bray
Broom
Brosig
Brower
Brown
Busse
Camerer
Cardy
Cardy
Challet
Challet
Chandler
Chiappin
Clifford
Colman
Conlisk
Coolen
Coricelli
Cressman
Cross
Czárán
Dawkins
Derényi
Dickman
Dickman
Dieckmann
Doebeli
Domany
Dornic
Dorogovtsev
Dorogovtsev
Douglass
Drossel
Drossel
Du
Dugatkin
Duran
Durrett
Durrett
Durrett
Dutta
Ebel
Eigen
Eisert
Ellner
Equíluz
Erdős
Fehr
Field
Fisch
Fisher
Forsythe
Fort
Foster
Frachebourg
Frachebourg
Frachebourg
Frean
Freidlin
Frick
Friedman
Fudenberg
Fudenberg
Fudenberg
Fuks
Föllmer
Gambarelli
Gammaitoni
Gao
Gao
Gardiner
Gardner
Gatenby
Geritz
Gibbons
Gilpin
Gintis
Glauber
Gould
Grassberger
Grassberger
Greenberg
Grim
Grim
Guan
Gutowitz
György Szabó
Györgyi
Gábor Fáth
Gómez-Gardeñez
Güth
Haken
Hamilton
Hamilton
Hardin
Hardin
Harris
Harsanyi
Hauert
Hauert
Hauert
Hauert
Hauert
Hauert
Hauk
He
Helbing
Helbing
Helbing
Hempel
Henrich
Hinrichsen
Hofbauer
Hofbauer
Hofbauer
Hofbauer
Holland
Holley
Holme
Huberman
Ifti
Ifti
Imhof
Jackson
Jansen
Janssen
Jensen
Johnson
Johnson
Joo
Jung
Kandori
Katz
Katz
Kawasaki
Kelly
Kermack
Kerr
Killingback
Killingback
Killingback
Kim
Kim
Kinzel
Kirchkamp
Kirkup
Kittel
Kobayashi
Kraines
Kraines
Krapivsky
Kreft
Kreps
Kuperman
Kuznetsov
Ledyard
Lee
Lee
Lee
Lewontin
Lieberman
Liggett
Lim
Lin
Lindgren
Lindgren
MacLean
Macy
Marro
Marsili
Martins
Masuda
Masuda
May
Maynard Smith
Maynard Smith
Maynard Smith
Meron
Metz
Meyer
Mie¸kisz
Mie¸kisz
Mie¸kisz
Milgram
Mobilia
Molander
Monderer
Moran
Mukherji
Mézard
Nakamaru
Nakamaru
Nash
Newman
Newman
Newman
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Nowak
Ohta
Ohtsuki
Ohtsuki
Pacheco
Pacheco
Page
Page
Palla
Panchanathan
Perc
Perc
Perc
Perc
Pettit
Pfeiffer
Pfeiffer
Pikovsky
Posch
Posch
Poundstone
Prager
Provata
Ralston
Rapoport
Rasmussen
Ravasz
Reichenbach
Reichenbach
Riolo
Robson
Roca
Russell
Saijo
Samuelson
Samuelson
Santos
Santos
Santos
Santos
Santos
Santos
Sato
Sato
Schlag
Schlag
Schmittmann
Schnakenberg
Schwarz
Schweitzer
Selten
Selten
Semmann
Shapley
Sigmund
Sigmund
Silvertown
Sinervo
Skyrms
Skyrms
Stanley
Sysi-Aho
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szabó
Szolnoki
Szolnoki
Szolnoki
Szolnoki
Szolnoki
Sánchez
Tainaka
Tainaka
Tainaka
Tainaka
Tainaka
Tainaka
Tainaka
Tang
Taylor
Taylor
Thaler
Thorndike
Tomassini
Tomochi
Toral
Traulsen
Traulsen
Traulsen
Traulsen
Traulsen
Traulsen
Traulsen
Traulsen
Traulsen
Traulsen
Trivers
Trivers
Turner
Vainstein
Vainstein
Vilenkin
von Neumann
von Neumann
Vukov
Wakano
Watt
Watts
Wedekind
Weibull
Weidlich
Wiener
Wild
Wilhelm
Winfree
Wolfram
Wolfram
Wolfram
Wolpert
Wormald
Wu
Wu
Wu
Young
Zeeman
Zimmermann
Zimmermann
Zimmermann
Zimmermann
Publication venue: 'Elsevier BV'
Publication date: 24/09/2007
Field of study

Game theory is one of the key paradigms behind many scientific disciplines from biology to behavioral sciences to economics. In its evolutionary form and especially when the interacting agents are linked in a specific social network the underlying solution concepts and methods are very similar to those applied in non-equilibrium statistical physics. This review gives a tutorial-type overview of the field for physicists. The first three sections introduce the necessary background in classical and evolutionary game theory from the basic definitions to the most important results. The fourth section surveys the topological complications implied by non-mean-field-type social network structures in general. The last three sections discuss in detail the dynamic behavior of three prominent classes of models: the Prisoner's Dilemma, the Rock-Scissors-Paper game, and Competing Associations. The major theme of the review is in what sense and how the graph structure of interactions can modify and enrich the picture of long term behavioral patterns emerging in evolutionary games.Comment: Review, final version, 133 pages, 65 figure

arXiv.org e-Print Archive

Crossref

Hide and Seek in Arizona

Author: Jason Shachat
Mark Walker
Robert W. Rosenthal
Publication venue
Publication date
Field of study

Laboratory subjects repeatedly played one of two variations of a simple two-person zero-sum game of ``hide and seek.'' Three puzzling departures from the prescriptions of equilibrium theory are found in the data: an asymmetry related to the player's role in the game; an asymmetry across the game variations; and positive serial correlation in subjects' play. Possible explanations for these departures are considered.Minimax, mixed strategy, experiment

Research Papers in Economics

Dynamics in atomic signaling games

Author: Fox Michael J.
Shamma Jeff S.
Touri Behrouz
Publication venue
Publication date: 20/12/2013
Field of study

We study an atomic signaling game under stochastic evolutionary dynamics. There is a finite number of players who repeatedly update from a finite number of available languages/signaling strategies. Players imitate the most fit agents with high probability or mutate with low probability. We analyze the long-run distribution of states and show that, for sufficiently small mutation probability, its support is limited to efficient communication systems. We find that this behavior is insensitive to the particular choice of evolutionary dynamic, a property that is due to the game having a potential structure with a potential function corresponding to average fitness. Consequently, the model supports conclusions similar to those found in the literature on language competition. That is, we show that efficient languages eventually predominate the society while reproducing the empirical phenomenon of linguistic drift. The emergence of efficiency in the atomic case can be contrasted with results for non-atomic signaling games that establish the non-negligible possibility of convergence, under replicator dynamics, to states of unbounded efficiency loss

arXiv.org e-Print Archive

CiteSeerX