613 research outputs found

    Assessing the Potential of Classical Q-learning in General Game Playing

    Get PDF
    After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) provides a good testbed for reinforcement learning to research AGI. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee &\& Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to allow comparison to Banerjee et al.. We find that Q-learning converges to a high win rate in GGP. For the ϵ\epsilon-greedy strategy, we propose a first enhancement, the dynamic ϵ\epsilon algorithm. In addition, inspired by (Gelly &\& Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games.Comment: arXiv admin note: substantial text overlap with arXiv:1802.0594

    Towards More Data-Aware Application Integration (extended version)

    Full text link
    Although most business application data is stored in relational databases, programming languages and wire formats in integration middleware systems are not table-centric. Due to costly format conversions, data-shipments and faster computation, the trend is to "push-down" the integration operations closer to the storage representation. We address the alternative case of defining declarative, table-centric integration semantics within standard integration systems. For that, we replace the current operator implementations for the well-known Enterprise Integration Patterns by equivalent "in-memory" table processing, and show a practical realization in a conventional integration system for a non-reliable, "data-intensive" messaging example. The results of the runtime analysis show that table-centric processing is promising already in standard, "single-record" message routing and transformations, and can potentially excel the message throughput for "multi-record" table messages.Comment: 18 Pages, extended version of the contribution to British International Conference on Databases (BICOD), 2015, Edinburgh, Scotlan

    Statistical GGP Game Decomposition

    Get PDF
    International audienceThis paper presents a statistical approach for the decomposition of games in the General Game Playing framework. General game players can drastically decrease game search cost if they hold a decomposed version of the game. Previous works on decomposition rely on syn-tactical structures, which can be missing from the game description, or on the disjunctive normal form of the rules, which is very costly to compute. We offer an approach to decompose single or multi-player games which can handle the different classes of compound games described in Game Description Language (parallel games, serial games, multiple games). Our method is based on a statistical analysis of relations between actions and fluents. We tested our program on 597 games. Given a timeout of 1 hour and few playouts (1k), our method successfully provides an expert-like decomposition for 521 of them. With a 1 minute timeout and 5k playouts, it provides a decomposition for 434 of them

    Towards general cooperative game playing

    Get PDF
    Attempts to develop generic approaches to game playing have been around for several years in the field of Artificial Intelligence. However, games that involve explicit cooperation among otherwise competitive players cooperative negotiation games have not been addressed by current approaches. Yet, such games provide a much richer set of features, related with social aspects of interactions, which make them appealing for envisioning real-world applications. This work proposes a generic agent architecture Alpha to tackle cooperative negotiation games, combining elements such as search strategies, negotiation, opponent modeling and trust management. The architecture is then validated in the context of two different games that fall in this category Diplomacy and Werewolves. Alpha agents are tested in several scenarios, against other state-of-the-art agents. Besides highlighting the promising performance of the agents, the role of each architectural component in each game is assessed. (c) Springer International Publishing AG, part of Springer Nature 2018

    Assessing the Potential of Classical Q-learning in General Game Playing

    Get PDF
    After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) provides a good testbed for reinforcement learning to research AGI. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee & Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex), to allow comparison to Banerjee et al. We find that Q-learning converges to a high win rate in GGP. For the ϵ" role="presentation" style="display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; position: relative;">ϵ-greedy strategy, we propose a first enhancement, the dynamic ϵ" role="presentation" style="display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; position: relative;">ϵ algorithm. In addition, inspired by (Gelly & Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games.Computer Systems, Imagery and Medi

    Composition with Target Constraints

    Full text link
    It is known that the composition of schema mappings, each specified by source-to-target tgds (st-tgds), can be specified by a second-order tgd (SO tgd). We consider the question of what happens when target constraints are allowed. Specifically, we consider the question of specifying the composition of standard schema mappings (those specified by st-tgds, target egds, and a weakly acyclic set of target tgds). We show that SO tgds, even with the assistance of arbitrary source constraints and target constraints, cannot specify in general the composition of two standard schema mappings. Therefore, we introduce source-to-target second-order dependencies (st-SO dependencies), which are similar to SO tgds, but allow equations in the conclusion. We show that st-SO dependencies (along with target egds and target tgds) are sufficient to express the composition of every finite sequence of standard schema mappings, and further, every st-SO dependency specifies such a composition. In addition to this expressive power, we show that st-SO dependencies enjoy other desirable properties. In particular, they have a polynomial-time chase that generates a universal solution. This universal solution can be used to find the certain answers to unions of conjunctive queries in polynomial time. It is easy to show that the composition of an arbitrary number of standard schema mappings is equivalent to the composition of only two standard schema mappings. We show that surprisingly, the analogous result holds also for schema mappings specified by just st-tgds (no target constraints). This is proven by showing that every SO tgd is equivalent to an unnested SO tgd (one where there is no nesting of function symbols). Similarly, we prove unnesting results for st-SO dependencies, with the same types of consequences.Comment: This paper is an extended version of: M. Arenas, R. Fagin, and A. Nash. Composition with Target Constraints. In 13th International Conference on Database Theory (ICDT), pages 129-142, 201

    Ontologies on the semantic web

    Get PDF
    As an informational technology, the World Wide Web has enjoyed spectacular success. In just ten years it has transformed the way information is produced, stored, and shared in arenas as diverse as shopping, family photo albums, and high-level academic research. The “Semantic Web” was touted by its developers as equally revolutionary but has not yet achieved anything like the Web’s exponential uptake. This 17 000 word survey article explores why this might be so, from a perspective that bridges both philosophy and IT

    Evaluating a reinforcement learning algorithm with a general intelligence test

    Full text link
    In this paper we apply the recent notion of anytime universal intelligence tests to the evaluation of a popular reinforcement learning algorithm, Q-learning. We show that a general approach to intelligence evaluation of AI algorithms is feasible. This top-down (theory-derived) approach is based on a generation of environments under a Solomonoff universal distribution instead of using a pre-defined set of specific tasks, such as mazes, problem repositories, etc. This first application of a general intelligence test to a reinforcement learning algorithm brings us to the issue of task-specific vs. general AI agents. This, in turn, suggests new avenues for AI agent evaluation and AI competitions, and also conveys some further insights about the performance of specific algorithms. © 2011 Springer-Verlag.We are grateful for the funding from the Spanish MEC and MICINN for projects TIN2009-06078-E/TIN, Consolider-Ingenio CSD2007-00022 and TIN2010-21062-C02, for MEC FPU grant AP2006-02323, and Generalitat Valenciana for Prometeo/2008/051.Insa Cabrera, J.; Dowe, DL.; Hernández Orallo, J. (2011). Evaluating a reinforcement learning algorithm with a general intelligence test. En Advances in Artificial Intelligence. Springer Verlag (Germany). 7023:1-11. https://doi.org/10.1007/978-3-642-25274-7_1S1117023Dowe, D.L., Hajek, A.R.: A non-behavioural, computational extension to the Turing Test. In: Intl. Conf. on Computational Intelligence & multimedia applications (ICCIMA 1998), Gippsland, Australia, pp. 101–106 (1998)Genesereth, M., Love, N., Pell, B.: General game playing: Overview of the AAAI competition. AI Magazine 26(2), 62 (2005)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, Atlantis, pp. 182–183 (2010)Hernández-Orallo, J.: On evaluating agent performance in a fixed period of time. In: Hutter, M., et al. (eds.) 3rd Intl. Conf. on Artificial General Intelligence, pp. 25–30. Atlantis Press (2010)Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Legg, S., Hutter, M.: A universal measure of intelligence for artificial agents. Intl. Joint Conf. on Artificial Intelligence, IJCAI 19, 1509 (2005)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Levin, L.A.: Universal sequential search problems. Problems of Information Transmission 9(3), 265–266 (1973)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer-Verlag New York, Inc. (2008)Sanghi, P., Dowe, D.L.: A computer program capable of passing IQ tests. In: Proc. 4th ICCS International Conference on Cognitive Science (ICCS 2003), Sydney, Australia, pp. 570–575 (2003)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and Control 7(1), 1–22 (1964)Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: Proc. of the 23rd Intl. Conf. on Machine Learning, ICML 2006, New York, pp. 881–888 (2006)Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT press (1998)Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)Veness, J., Ng, K.S., Hutter, M., Silver, D.: Reinforcement learning via AIXI approximation. In: Proc. 24th Conf. on Artificial Intelligence (AAAI 2010), pp. 605–611 (2010)Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine learning 8(3), 279–292 (1992)Weyns, D., Parunak, H.V.D., Michel, F., Holvoet, T., Ferber, J.: Environments for multiagent systems state-of-the-art and research challenges. In: Weyns, D., Van Dyke Parunak, H., Michel, F. (eds.) E4MAS 2004. LNCS (LNAI), vol. 3374, pp. 1–47. Springer, Heidelberg (2005)Whiteson, S., Tanner, B., White, A.: The Reinforcement Learning Competitions. The AI magazine 31(2), 81–94 (2010)Woergoetter, F., Porr, B.: Reinforcement learning. Scholarpedia 3(3), 1448 (2008)Zatuchna, Z., Bagnall, A.: Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior 17(1), 28–57 (2009
    corecore