10 research outputs found

    Analysis, design and implementation of the game FĂştbol en papel and application and comparison of some classical and bio-inspired artificial intelligence techniques to generate automatic controllers

    Get PDF
    El objetivo principal de este Trabajo Fin de Grado es implementar una versión del juego Fútbol en papel en Java y realizar un estudio y comparativa entre las técnicas clásicas de Inteligencia Artificial en juegos de estrategia por turnos y técnicas inspiradas en la Naturaleza. Además, se analizará la complejidad de este dominio (juego Fútbol en papel) y se mostrará una visión general de las técnicas usadas en juegos de estrategia por turnos.Ingeniería Informátic

    Computer Chess: From Idea to DeepMind

    Get PDF
    Computer chess has stimulated human imagination over some two hundred and fifty years. In 1769 Baron Wolfgang von Kempelen promised Empress Maria Theresia in public: “I will invent a machine for a more compelling spectacle [than the magnetism tricks by Pelletier] within half a year.” The idea of an intelligent chess machine was born. In 1770 the first demonstration was given.The real development of artificial intelligence (AI) began in 1950 and contains many well-known names, such as Turing and Shannon. One of the first AI research areas was chess. In 1997, a high point was to be reported: world champion Gary Kasparov had been defeated by Deep Blue. The techniques used included searching, knowledge representation, parallelism, and distributed systems. Adaptivity, machine learning and the recently developed deep learning mechanism were only later on added to the computer chess research techniques.The major breakthrough for games in general (including chess) took place in 2017 when (1) the AlphaGo Zero program defeated the world championship program AlphaGo by 100-0 and (2) the technique of deep learning also proved applicable to chess. In the autumn of 2017, the Stockfish program was beaten by AlphaZero by 28-0 (with 72 draws, resulting in a 64-36 victory). However, the end of the disruptive advance is not yet in reach. In fact, we have just started. The next milestone will be to determine the theoretical game value of chess (won, draw, or lost). This achievement will certainly be followed by other surprising developments.Algorithms and the Foundations of Software technolog

    Intelligent strategy for two-person non-random perfect information zero-sum game.

    Get PDF
    Tong Kwong-Bun.Thesis submitted in: December 2002.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 77-[80]).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- An Overview --- p.1Chapter 1.2 --- Tree Search --- p.2Chapter 1.2.1 --- Minimax Algorithm --- p.2Chapter 1.2.2 --- The Alpha-Beta Algorithm --- p.4Chapter 1.2.3 --- Alpha-Beta Enhancements --- p.5Chapter 1.2.4 --- Selective Search --- p.9Chapter 1.3 --- Construction of Evaluation Function --- p.16Chapter 1.4 --- Contribution of the Thesis --- p.17Chapter 1.5 --- Structure of the Thesis --- p.19Chapter 2 --- The Probabilistic Forward Pruning Framework --- p.20Chapter 2.1 --- Introduction --- p.20Chapter 2.2 --- The Generalized Probabilistic Forward Cuts Heuristic --- p.21Chapter 2.3 --- The GPC Framework --- p.24Chapter 2.3.1 --- The Alpha-Beta Algorithm --- p.24Chapter 2.3.2 --- The NegaScout Algorithm --- p.25Chapter 2.3.3 --- The Memory-enhanced Test Algorithm --- p.27Chapter 2.4 --- Summary --- p.27Chapter 3 --- The Fast Probabilistic Forward Pruning Framework --- p.30Chapter 3.1 --- Introduction --- p.30Chapter 3.2 --- The Fast GPC Heuristic --- p.30Chapter 3.2.1 --- The Alpha-Beta algorithm --- p.32Chapter 3.2.2 --- The NegaScout algorithm --- p.32Chapter 3.2.3 --- The Memory-enhanced Test algorithm --- p.35Chapter 3.3 --- Performance Evaluation --- p.35Chapter 3.3.1 --- Determination of the Parameters --- p.35Chapter 3.3.2 --- Result of Experiments --- p.38Chapter 3.4 --- Summary --- p.42Chapter 4 --- The Node-Cutting Heuristic --- p.43Chapter 4.1 --- Introduction --- p.43Chapter 4.2 --- Move Ordering --- p.43Chapter 4.2.1 --- Quality of Move Ordering --- p.44Chapter 4.3 --- Node-Cutting Heuristic --- p.46Chapter 4.4 --- Performance Evaluation --- p.48Chapter 4.4.1 --- Determination of the Parameters --- p.48Chapter 4.4.2 --- Result of Experiments --- p.50Chapter 4.5 --- Summary --- p.55Chapter 5 --- The Integrated Strategy --- p.56Chapter 5.1 --- Introduction --- p.56Chapter 5.2 --- "Combination of GPC, FGPC and Node-Cutting Heuristic" --- p.56Chapter 5.3 --- Performance Evaluation --- p.58Chapter 5.4 --- Summary --- p.63Chapter 6 --- Conclusions and Future Works --- p.64Chapter 6.1 --- Conclusions --- p.64Chapter 6.2 --- Future Works --- p.65Chapter A --- Examples --- p.67Chapter B --- The Rules of Chinese Checkers --- p.73Chapter C --- Application to Chinese Checkers --- p.75Bibliography --- p.7

    Efficient Preference-based Reinforcement Learning

    Get PDF
    Common reinforcement learning algorithms assume access to a numeric feedback signal. The numeric feedback contains a high amount of information and can be maximized efficiently. However, the definition of a numeric feedback signal can be difficult in practise due to several limitations and badly defined values may lead to an unintended outcome. For humans, it is usually easier to define qualitative feedback signals than quantitative. Hence, we want to solve reinforcement learning problems with a qualitative signal, potentially capable of overcoming several of the limitations of numeric feedback. Preferences have several advantages over other qualitative settings, like ordinal feedback or advice. Preferences are scale-free and do not require assumptions over the optimal outcome. However, preferences are difficult to use for solving sequential decision problems, because it is unknown which decisions are responsible for the observed preference. Hence, we analyze different approaches for learning from preferences and show the design principles that can be used, as well as the advantages and problems that occur. We also survey the field of preference-based reinforcement learning and categorize the algorithms according to the design principles. Efficiency is of special interest in this setting, as it is important to keep the amount of required preferences low, because they depend on human evaluation. Hence, our focus is on efficient use of the preferences. It can be stated that it is important to be able to generalize the obtained preferences, as this keeps the amount of required preferences low. Therefore, we consider methods that are able to generalize the obtained preferences to models not yet evaluated. However, this introduces uncertain feedback and the exploration/exploitation problem already known from classical reinforcement learning has to be considered with the preferences in mind. We show how to efficiently solve this dual exploration problem by interleaving both tasks, in an undirected manner. We use undirected exploration methods, because they scale better to high-dimensional spaces. Furthermore, human feedback has to be assumed to be error-prone and we analyze the problems that arise when using human evaluation. We show that noise is the most substantial problem when dealing with human preferences and present a solution to this problem

    An Improvement to the Scout Tree Search Algorithm

    No full text
    corecore