Search CORE

284 research outputs found

Assessing the Potential of Classical Q-learning in General Game Playing

Author: CB Browne
CJCH Watkins
CP Robert
D Silver
D Silver
H Wang
J Hu
J Méhat
M Genesereth
M Genesereth
M Świechowski
RS Sutton
V Mnih
Publication venue
Publication date: 14/10/2018
Field of study

\&

Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to allow comparison to Banerjee et al.. We find that Q-learning converges to a high win rate in GGP. For the

\epsilon

-greedy strategy, we propose a first enhancement, the dynamic

\epsilon

algorithm. In addition, inspired by (Gelly

\&

Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games.Comment: arXiv admin note: substantial text overlap with arXiv:1802.0594

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Towards More Data-Aware Application Integration (extended version)

Author: C Zaniolo
D Chappell
D Ritter
D Ritter
G Hohpe
J Anstey
J Ullman
M Welsh
MA Shah
MR Genesereth
TJ Green
Publication venue
Publication date: 22/04/2015
Field of study

Although most business application data is stored in relational databases, programming languages and wire formats in integration middleware systems are not table-centric. Due to costly format conversions, data-shipments and faster computation, the trend is to "push-down" the integration operations closer to the storage representation. We address the alternative case of defining declarative, table-centric integration semantics within standard integration systems. For that, we replace the current operator implementations for the well-known Enterprise Integration Patterns by equivalent "in-memory" table processing, and show a practical realization in a conventional integration system for a non-reliable, "data-intensive" messaging example. The results of the runtime analysis show that table-centric processing is promising already in standard, "single-record" message routing and transformations, and can potentially excel the message throughput for "multi-record" table messages.Comment: 18 Pages, extended version of the contribution to British International Conference on Databases (BICOD), 2015, Edinburgh, Scotlan

arXiv.org e-Print Archive

Crossref

Towards general cooperative game playing

Author: A Drogoul
A Ferreira
AB Calhamer
D Silver
J Nash
LP Reis
M Genesereth
S Kraus
É Bonnet
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Attempts to develop generic approaches to game playing have been around for several years in the field of Artificial Intelligence. However, games that involve explicit cooperation among otherwise competitive players cooperative negotiation games have not been addressed by current approaches. Yet, such games provide a much richer set of features, related with social aspects of interactions, which make them appealing for envisioning real-world applications. This work proposes a generic agent architecture Alpha to tackle cooperative negotiation games, combining elements such as search strategies, negotiation, opponent modeling and trust management. The architecture is then validated in the context of two different games that fall in this category Diplomacy and Werewolves. Alpha agents are tested in several scenarios, against other state-of-the-art agents. Besides highlighting the promising performance of the agents, the role of each architectural component in each game is assessed. (c) Springer International Publishing AG, part of Springer Nature 2018

Crossref

Repositório Aberto da Universidade do Porto

Assessing the Potential of Classical Q-learning in General Game Playing

Author: CB Browne
CJCH Watkins
CP Robert
D Silver
D Silver
H Wang
J Hu
J Méhat
M Genesereth
M Genesereth
M Świechowski
RS Sutton
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/09/2019
Field of study

After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) provides a good testbed for reinforcement learning to research AGI. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee & Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex), to allow comparison to Banerjee et al. We find that Q-learning converges to a high win rate in GGP. For the ϵ" role="presentation" style="display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; position: relative;">ϵ-greedy strategy, we propose a first enhancement, the dynamic ϵ" role="presentation" style="display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; position: relative;">ϵ algorithm. In addition, inspired by (Gelly & Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games.Computer Systems, Imagery and Medi

Crossref

Leiden University Scholary Publications

Composition with Target Constraints

Author: A. Deutsch L. Popa, and V. Tannen
A. Fuxman P. Kolaitis, R. Miller, and W
Alan Nash
G. Gottlob and S. Szeider
Luc Segoufin
M. Casanova R. Fagin, and C. Papadimitr
Marcelo Arenas
O. M. Duschka M. R. Genesereth, and A.
P. Barceló
P. Papotti and R. Torlone
R. Fagin L. Stockmeyer, and M. Vardi
R. Fagin P. G. Kolaitis, L. Popa, and W
Ronald Fagin
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2010
Field of study

It is known that the composition of schema mappings, each specified by source-to-target tgds (st-tgds), can be specified by a second-order tgd (SO tgd). We consider the question of what happens when target constraints are allowed. Specifically, we consider the question of specifying the composition of standard schema mappings (those specified by st-tgds, target egds, and a weakly acyclic set of target tgds). We show that SO tgds, even with the assistance of arbitrary source constraints and target constraints, cannot specify in general the composition of two standard schema mappings. Therefore, we introduce source-to-target second-order dependencies (st-SO dependencies), which are similar to SO tgds, but allow equations in the conclusion. We show that st-SO dependencies (along with target egds and target tgds) are sufficient to express the composition of every finite sequence of standard schema mappings, and further, every st-SO dependency specifies such a composition. In addition to this expressive power, we show that st-SO dependencies enjoy other desirable properties. In particular, they have a polynomial-time chase that generates a universal solution. This universal solution can be used to find the certain answers to unions of conjunctive queries in polynomial time. It is easy to show that the composition of an arbitrary number of standard schema mappings is equivalent to the composition of only two standard schema mappings. We show that surprisingly, the analogous result holds also for schema mappings specified by just st-tgds (no target constraints). This is proven by showing that every SO tgd is equivalent to an unnested SO tgd (one where there is no nesting of function symbols). Similarly, we prove unnesting results for st-SO dependencies, with the same types of consequences.Comment: This paper is an extended version of: M. Arenas, R. Fagin, and A. Nash. Composition with Target Constraints. In 13th International Conference on Database Theory (ICDT), pages 129-142, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Episciences.org

The Multi-Agent Programming Contest: A r\'esum\'e

Author: A Visser
KV Hindriks
M Bratman
M Genesereth
M Johnson
M Vallati
O Boissier
RH Bordini
S Karakovskiy
T Ahlbrecht
T Ahlbrecht
T Ahlbrecht
T Ahlbrecht
T Behrens
TM Behrens
W Ketter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/06/2020
Field of study

The Multi-Agent Programming Contest, MAPC, is an annual event organized since 2005 out of Clausthal University of Technology. Its aim is to investigate the potential of using decentralized, autonomously acting intelligent agents, by providing a complex scenario to be solved in a competitive environment. For this we need suitable benchmarks where agent-based systems can shine. We present previous editions of the contest and also its current scenario and results from its use in the 2019 MAPC with a special focus on its suitability. We conclude with lessons learned over the years.Comment: Submitted to the proceedings of the Multi-Agent Programming Contest 2019, to appear in Springer Lect. Notes Computer Challenges Series https://www.springer.com/series/1652

arXiv.org e-Print Archive

Crossref

Evaluating a reinforcement learning algorithm with a general intelligence test

Author: A.M. Turing
C.J.C.H. Watkins
D. Weyns
F. Woergoetter
J. Hernández-Orallo
J. Hernández-Orallo
L.A. Levin
M. Genesereth
R.J. Solomonoff
S. Legg
S. Legg
S. Whiteson
Z. Zatuchna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper we apply the recent notion of anytime universal intelligence tests to the evaluation of a popular reinforcement learning algorithm, Q-learning. We show that a general approach to intelligence evaluation of AI algorithms is feasible. This top-down (theory-derived) approach is based on a generation of environments under a Solomonoff universal distribution instead of using a pre-defined set of specific tasks, such as mazes, problem repositories, etc. This first application of a general intelligence test to a reinforcement learning algorithm brings us to the issue of task-specific vs. general AI agents. This, in turn, suggests new avenues for AI agent evaluation and AI competitions, and also conveys some further insights about the performance of specific algorithms. © 2011 Springer-Verlag.We are grateful for the funding from the Spanish MEC and MICINN for projects TIN2009-06078-E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, for MEC FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051.Insa Cabrera, J.; Dowe, DL.; Hernández Orallo, J. (2011). Evaluating a reinforcement learning algorithm with a general intelligence test. En Advances in Artificial Intelligence. Springer Verlag (Germany). 7023:1-11. https://doi.org/10.1007/978-3-642-25274-7_1S1117023Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (1998)Genesereth, M., Love, N., Pell, B.: General game playing: Overview of the AAAI competition. AI Magazine 26(2), 62 (2005)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, Atlantis, pp. 182–183 (2010)Hernández-Orallo, J.: On evaluating agent performance in a fixed period of time. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, pp. 25–30. Atlantis Press (2010)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Legg, S., Hutter, M.: A universal measure of intelligence for artificial agents. Intl. Joint Conf. on Artificial Intelligence, IJCAI 19, 1509 (2005)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Levin, L.A.: Universal sequential search problems. Problems of Information Transmission 9(3), 265–266 (1973)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer-Verlag New York, Inc. (2008)Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: Proc. 4th ICCS International Conference on Cognitive Science (ICCS 2003), Sydney, Australia, pp. 570–575 (2003)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and Control 7(1), 1–22 (1964)Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: Proc. of the 23rd Intl. Conf. on Machine Learning, ICML 2006, New York, pp. 881–888 (2006)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT press (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: Reinforcement learning via AIXI approximation. In: Proc. 24th Conf. on Artificial Intelligence (AAAI 2010), pp. 605–611 (2010)Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine learning 8(3), 279–292 (1992)Weyns, D., Parunak, H.V.D., Michel, F., Holvoet, T., Ferber, J.: Environments for multiagent systems state-of-the-art and research challenges. In: Weyns, D., Van Dyke Parunak, H., Michel, F. (eds.) E4MAS 2004. LNCS (LNAI), vol. 3374, pp. 1–47. Springer, Heidelberg (2005)Whiteson, S., Tanner, B., White, A.: The Reinforcement Learning Competitions. The AI magazine 31(2), 81–94 (2010)Woergoetter, F., Porr, B.: Reinforcement learning. Scholarpedia 3(3), 1448 (2008)Zatuchna, Z., Bagnall, A.: Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior 17(1), 28–57 (2009

Crossref

RiuNet

D-Brane: a diplomacy playing agent for automated negotiations research

Author: B An
Carles Sierra
Dave de Jonge
F Rossi
G Lai
J Nash
J Nash
JS Rosenschein
M Genesereth
M Klein
P Faratin
S Kraus
T Baarslag
T Ito
V Krishna
W Poundstone
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Neural Networks for State Evaluation in General Game Playing

Author: A. d’Avila Garcez
F. Hsu
G. Kuhlmann
G. Tesauro
H. Finnsson
J. Clune
M. Genesereth
S. Schiffel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract. Unlike traditional game playing, General Game Playing is concerned with agents capable of playing classes of games. Given the rules of an unknown game, the agent is supposed to play well without human intervention. For this purpose, agent systems that use deterministic game tree search need to automatically construct a state value function to guide search. Successful systems of this type use evaluation functions derived solely from the game rules, thus neglecting further improvements by experience. In addition, these functions are fixed in their form and do not necessarily capture the game’s real state value function. In this work we present an approach for obtaining evaluation functions on the basis of neural networks that overcomes the aforementioned problems. A network initialization extracted from the game rules ensures reasonable behavior without the need for prior training. Later training, however, can lead to significant improvements in evaluation quality, as our results indicate.

CiteSeerX

Crossref

Western Sydney ResearchDirect

Cognition in Context: Phenomenology, Situated Robotics and the Frame Problem

Author: Clark A.
Cliff D.
Derrida J.
Dreyfus H. L.
Dreyfus H. L.
Dreyfus H. L.
Dreyfus H. L.
Fodor J. A.
Fodor J. A.
Freeman W.
Genesereth M. R.
Heidegger M.
Holland J. H.
Kuhn T. S.
Marr D.
Merleau‐Ponty M.
Michael Wheeler
Pylyshyn Z.
Rubinstein A.
Shanahan M.
Webb B.
Wheeler M.
Wheeler M.
Wheeler M.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

The frame problem is the difficulty of explaining how non-magical systems think and act in ways that are adaptively sensitive to context-dependent relevance. Influenced centrally by Heideggerian phenomenology, Hubert Dreyfus has argued that the frame problem is, in part, a consequence of the assumption (made by mainstream cognitive science and artificial intelligence) that intelligent behaviour is representation-guided behaviour. Dreyfus’ Heideggerian analysis suggests that the frame problem dissolves if we reject representationalism about intelligence and recognize that human agents realize the property of thrownness (the property of being always already embedded in a context). I argue that this positive proposal is incomplete until we understand exactly how the properties in question may be instantiated in machines like us. So, working within a broadly Heideggerian conceptual framework, I pursue the character of a representationshunning thrown machine. As part of this analysis, I suggest that the frame problem is, in truth, a two-headed beast. The intra-context frame problem challenges us to say how a purely mechanistic system may achieve appropriate, flexible and fluid action within a context. The inter-context frame problem challenges us to say how a purely mechanistic system may achieve appropriate, flexible and fluid action in worlds in which adaptation to new contexts is open-ended and in which the number of potential contexts is indeterminate. Drawing on the field of situated robotics, I suggest that the intra-context frame problem may be neutralized by systems of special purpose adaptive couplings, while the inter-context frame problem may be neutralized by systems that exhibit the phenomenon of continuous reciprocal causation. I also defend the view that while continuous reciprocal causation is in conflict with representational explanation, special-purpose adaptive coupling, as well as its associated agential phenomenology, may feature representations. My proposal has been criticized recently by Dreyfus, who accuses me of propagating a cognitivist misreading of Heidegger, one that, because it maintains a role for representation, leads me seriously astray in my handling of the frame problem. I close by responding to Dreyfus’ concerns

CiteSeerX

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository