61 research outputs found

    A hybridisation technique for game playing using the upper confidence for trees algorithm with artificial neural networks

    Get PDF
    In the domain of strategic game playing, the use of statistical techniques such as the Upper Confidence for Trees (UCT) algorithm, has become the norm as they offer many benefits over classical algorithms. These benefits include requiring no game-specific strategic knowledge and time-scalable performance. UCT does not incorporate any strategic information specific to the game considered, but instead uses repeated sampling to effectively brute-force search through the game tree or search space. The lack of game-specific knowledge in UCT is thus both a benefit but also a strategic disadvantage. Pattern recognition techniques, specifically Neural Networks (NN), were identified as a means of addressing the lack of game-specific knowledge in UCT. Through a novel hybridisation technique which combines UCT and trained NNs for pruning, the UCTNN algorithm was derived. The NN component of UCT-NN was trained using a UCT self-play scheme to generate game-specific knowledge without the need to construct and manage game databases for training purposes. The UCT-NN algorithm is outlined for pruning in the game of Go-Moku as a candidate case-study for this research. The UCT-NN algorithm contained three major parameters which emerged from the UCT algorithm, the use of NNs and the pruning schemes considered. Suitable methods for finding candidate values for these three parameters were outlined and applied to the game of Go-Moku on a 5 by 5 board. An empirical investigation of the playing performance of UCT-NN was conducted in comparison to UCT through three benchmarks. The benchmarks comprise a common randomly moving opponent, a common UCTmax player which is given a large amount of playing time, and a pair-wise tournament between UCT-NN and UCT. The results of the performance evaluation for 5 by 5 Go-Moku were promising, which prompted an evaluation of a larger 9 by 9 Go-Moku board. The results of both evaluations indicate that the time allocated to the UCT-NN algorithm directly affects its performance when compared to UCT. The UCT-NN algorithm generally performs better than UCT in games with very limited time-constraints in all benchmarks considered except when playing against a randomly moving player in 9 by 9 Go-Moku. In real-time and near-real-time Go-Moku games, UCT-NN provides statistically significant improvements compared to UCT. The findings of this research contribute to the realisation of applying game-specific knowledge to the UCT algorithm

    Symbolic Search in Planning and General Game Playing

    Get PDF
    Search is an important topic in many areas of AI. Search problems often result in an immense number of states. This work addresses this by using a special datastructure, BDDs, which can represent large sets of states efficiently, often saving space compared to explicit representations. The first part is concerned with an analysis of the complexity of BDDs for some search problems, resulting in lower or upper bounds on BDD sizes for these. The second part is concerned with action planning, an area where the programmer does not know in advance what the search problem will look like. This part presents symbolic algorithms for finding optimal solutions for two different settings, classical and net-benefit planning, as well as several improvements to these algorithms. The resulting planner was able to win the International Planning Competition IPC 2008. The third part is concerned with general game playing, which is similar to planning in that the programmer does not know in advance what game will be played. This work proposes algorithms for instantiating the input and solving games symbolically. For playing, a hybrid player based on UCT and the solver is presented

    General-Purpose Planning Algorithms In Partially-Observable Stochastic Games

    Get PDF
    Partially observable stochastic games (POSGs) are difficult domains to plan in because they feature multiple agents with potentially opposing goals, parts of the world are hidden from the agents, and some actions have random outcomes. It is infeasible to solve a large POSG optimally. While it may be tempting to design a specialized algorithm for finding suboptimal solutions to a particular POSG, general-purpose planning algorithms can work just as well, but with less complexity and domain knowledge required. I explore this idea in two different POSGs: Navy Defense and Duelyst. In Navy Defense, I show that a specialized algorithm framework, goal-driven autonomy, which requires a complex subsystem separate from the planner for explicitly reasoning about goals, is unnecessary, as simple general planners such as hindsight optimization exhibit implicit goal reasoning and have strong performance. In Duelyst, I show that a specialized expert-rule-based AI can be consistently beaten by a simple general planner using only a small amount of domain knowledge. I also introduce a modification to Monte Carlo tree search that increases performance when rollouts are slow and there are time constraints on planning

    Deterministic and Probabilistic Binary Search in Graphs

    Full text link
    We consider the following natural generalization of Binary Search: in a given undirected, positively weighted graph, one vertex is a target. The algorithm's task is to identify the target by adaptively querying vertices. In response to querying a node qq, the algorithm learns either that qq is the target, or is given an edge out of qq that lies on a shortest path from qq to the target. We study this problem in a general noisy model in which each query independently receives a correct answer with probability p>12p > \frac{1}{2} (a known constant), and an (adversarial) incorrect one with probability 1āˆ’p1-p. Our main positive result is that when p=1p = 1 (i.e., all answers are correct), logā”2n\log_2 n queries are always sufficient. For general pp, we give an (almost information-theoretically optimal) algorithm that uses, in expectation, no more than (1āˆ’Ī“)logā”2n1āˆ’H(p)+o(logā”n)+O(logā”2(1/Ī“))(1 - \delta)\frac{\log_2 n}{1 - H(p)} + o(\log n) + O(\log^2 (1/\delta)) queries, and identifies the target correctly with probability at leas 1āˆ’Ī“1-\delta. Here, H(p)=āˆ’(plogā”p+(1āˆ’p)logā”(1āˆ’p))H(p) = -(p \log p + (1-p) \log(1-p)) denotes the entropy. The first bound is achieved by the algorithm that iteratively queries a 1-median of the nodes not ruled out yet; the second bound by careful repeated invocations of a multiplicative weights algorithm. Even for p=1p = 1, we show several hardness results for the problem of determining whether a target can be found using KK queries. Our upper bound of logā”2n\log_2 n implies a quasipolynomial-time algorithm for undirected connected graphs; we show that this is best-possible under the Strong Exponential Time Hypothesis (SETH). Furthermore, for directed graphs, or for undirected graphs with non-uniform node querying costs, the problem is PSPACE-complete. For a semi-adaptive version, in which one may query rr nodes each in kk rounds, we show membership in Ī£2kāˆ’1\Sigma_{2k-1} in the polynomial hierarchy, and hardness for Ī£2kāˆ’5\Sigma_{2k-5}

    Learning Curricula in Open-Ended Worlds

    Full text link
    Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.Comment: PhD dissertatio

    Robot Planning in Adversarial Environments Using Tree Search Techniques

    Get PDF
    One of the main advantages of robots is that they can be used in environments that are dangerous for humans. Robots can not only be used for tasks in known and safe areas but also in environments that may have adversaries. When planning the robot's actions in such scenarios, we have to consider the outcomes of a robot's actions based on the actions taken by the adversary, as well as the information available to the robot and the adversary. The goal of this dissertation is to design planning strategies that improve the robot's performance in adversarial environments. Specifically, we study how the availability of information affects the planning process and the outcome. We also study how to improve the computational efficiency by exploiting the structural properties of the underlying setting. We adopt a game-theoretic formulation and study two scenarios: adversarial active target tracking and reconnaissance in environments with adversaries. A conservative approach is to plan the robot's action assuming a worst-case adversary with complete knowledge of the robot's state and objective. We start with such a "symmetric" information game for the adversarial target tracking scenario with noisy sensing. By using the properties of the Kalman filter, we design a pruning strategy to improve the efficiency of a tree search algorithm. We investigate the performance limits of the asymmetric version where the adversary can inject false sensing data. We then study a reconnaissance scenario where the robot and the adversary have symmetric information. We design an algorithm that allows a robot to scan more area while avoiding being detected by the adversary. The symmetric adversarial model may yield too conservative plans when the adversary may not have the same information as the robot. Furthermore, the information available to the adversary may change during execution. We then investigate the dynamic version of this asymmetric information game and show how much the robot can exploit the asymmetry in information using tree search techniques. Specifically, we study scenarios where the information available to the adversary changes during execution. We devise a new algorithm for this asymmetric information game with theoretical performance guarantees and evaluate those approaches through experiments. We use qualitative examples to show how the new algorithm can outperform symmetric minimax and use quantitative experiments to show how much the improvement is

    Approximate inference in graphical models

    Get PDF
    Probability theory provides a mathematically rigorous yet conceptually flexible calculus of uncertainty, allowing the construction of complex hierarchical models for real-world inference tasks. Unfortunately, exact inference in probabilistic models is often computationally expensive or even intractable. A close inspection in such situations often reveals that computational bottlenecks are confined to certain aspects of the model, which can be circumvented by approximations without having to sacrifice the model's interesting aspects. The conceptual framework of graphical models provides an elegant means of representing probabilistic models and deriving both exact and approximate inference algorithms in terms of local computations. This makes graphical models an ideal aid in the development of generalizable approximations. This thesis contains a brief introduction to approximate inference in graphical models (Chapter 2), followed by three extensive case studies in which approximate inference algorithms are developed for challenging applied inference problems. Chapter 3 derives the first probabilistic game tree search algorithm. Chapter 4 provides a novel expressive model for inference in psychometric questionnaires. Chapter 5 develops a model for the topics of large corpora of text documents, conditional on document metadata, with a focus on computational speed. In each case, graphical models help in two important ways: They first provide important structural insight into the problem; and then suggest practical approximations to the exact probabilistic solution.This work was supported by a scholarship from Microsoft Research, Ltd

    JanusAQP: Efficient Partition Tree Maintenance for Dynamic Approximate Query Processing

    Full text link
    Approximate query processing over dynamic databases, i.e., under insertions/deletions, has applications ranging from high-frequency trading to internet-of-things analytics. We present JanusAQP, a new dynamic AQP system, which supports SUM, COUNT, AVG, MIN, and MAX queries under insertions and deletions to the dataset. JanusAQP extends static partition tree synopses, which are hierarchical aggregations of datasets, into the dynamic setting. This paper contributes new methods for: (1) efficient initialization of the data synopsis in the presence of incoming data, (2) maintenance of the data synopsis under insertions/deletions, and (3) re-optimization of the partitioning to reduce the approximation error. JanusAQP reduces the error of a state-of-the-art baseline by more than 60% using only 10% storage cost. JanusAQP can process more than 100K updates per second in a single node setting and keep the query latency at a millisecond level
    • ā€¦
    corecore