46 research outputs found

    Selecting Computations: Theory and Applications

    Full text link
    Sequential decision problems are often approximately solvable by simulating possible future action sequences. {\em Metalevel} decision procedures have been developed for selecting {\em which} action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian {\em selection problems}, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.Comment: 10 pages, UAI 201

    Analysis of different MCTS implementations of artificial intelligence for the Children of the Galaxy computer game

    Get PDF
    Monte Carlo Tree Search (MCTS) is a popular game AI algorithm that searches the state space of a game while using randomized playouts to evaluate new states. There have been many papers published about various adjustments of the original algorithm, however, work that compares multiple of these algorithms together does not seem to exist. This lack of data can make it difficult to decide which variant to use without implementing and testing them which is potentially quite time-consuming. The aim of this thesis is therefore twofold. First to create such a comparison in a specific setting and second to introduce a new variant, WP MCTS, which is based on the idea that one should be able to gather more information from a playout by taking a look at all the states encountered during its computation. For our setting, we chose battles between small armies in a 4X computer game called Children of the Galaxy. The results presented here indicate that many, though not all tested variants outperform basic MCTS in this setting. 1Monte Carlo Tree Search (MCTS) je populární algoritmus pro umělou inteligenci do počítačových her, který funguje na základě prohledávání stavového prostoru hry za použití náhodných simulací na ohodnocení nových stavů. Bylo již pub- likováno mnoho článků o různých úpravách původního algoritmu, avšak práce, které by porovnávaly vícero těchto upravených algoritmů mezi sebou patrně neex- istují. Tenhle nedostatek dat může pro vývojáře činit výběr varianty pro vlastní účely obtížným, aniž by je sám implementoval a otestoval - což však může být časově náročné. Tato práce má proto dva cíle. Prvním je porovnat různé vari- anty MCTS ve specifickém prostředí a druhým je představit novou variantu, WP MCTS, která vychází z předpokladu, že pohledem na všechny stavy projedené při simulaci by mělo být možné získat více informací než jen obodováním kon- cového stavu. Pro naše prostředí jsme si vybrali souboje mezi malými armádami ve 4X počítačové hře zvané Children of the Galaxy. Výsledky prezentované v této práci indikují, že mnohé, avšak ne všechny testované varianty podávají v tomto prostředí lepší výkon než původní algoritmus. 1Department of Software and Computer Science EducationKatedra softwaru a výuky informatikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic

    Balancing exploration and exploitation: task-targeted exploration for scientific decision-making

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2022.How do we collect observational data that reveal fundamental properties of scientific phenomena? This is a key challenge in modern scientific discovery. Scientific phenomena are complex—they have high-dimensional and continuous state, exhibit chaotic dynamics, and generate noisy sensor observations. Additionally, scientific experimentation often requires significant time, money, and human effort. In the face of these challenges, we propose to leverage autonomous decision-making to augment and accelerate human scientific discovery. Autonomous decision-making in scientific domains faces an important and classical challenge: balancing exploration and exploitation when making decisions under uncertainty. This thesis argues that efficient decision-making in real-world, scientific domains requires task-targeted exploration—exploration strategies that are tuned to a specific task. By quantifying the change in task performance due to exploratory actions, we enable decision-makers that can contend with highly uncertain real-world environments, performing exploration parsimoniously to improve task performance. The thesis presents three novel paradigms for task-targeted exploration that are motivated by and applied to real-world scientific problems. We first consider exploration in partially observable Markov decision processes (POMDPs) and present two novel planners that leverage task-driven information measures to balance exploration and exploitation. These planners drive robots in simulation and oceanographic field trials to robustly identify plume sources and track targets with stochastic dynamics. We next consider the exploration- exploitation trade-off in online learning paradigms, a robust alternative to POMDPs when the environment is adversarial or difficult to model. We present novel online learning algorithms that balance exploitative and exploratory plays optimally under real-world constraints, including delayed feedback, partial predictability, and short regret horizons. We use these algorithms to perform model selection for subseasonal temperature and precipitation forecasting, achieving state-of-the-art forecasting accuracy. The human scientific endeavor is poised to benefit from our emerging capacity to integrate observational data into the process of model development and validation. Realizing the full potential of these data requires autonomous decision-makers that can contend with the inherent uncertainty of real-world scientific domains. This thesis highlights the critical role that task-targeted exploration plays in efficient scientific decision-making and proposes three novel methods to achieve task-targeted exploration in real-world oceanographic and climate science applications.This material is based upon work supported by the NSF Graduate Research Fellowship Program and a Microsoft Research PhD Fellowship, as well as the Department of Energy / National Nuclear Security Administration under Award Number DE-NA0003921, the Office of Naval Research under Award Number N00014-17-1-2072, and DARPA under Award Number HR001120C0033

    Spartan Daily, May 6, 1987

    Get PDF
    Volume 88, Issue 62https://scholarworks.sjsu.edu/spartandaily/7587/thumbnail.jp

    Suffolk Journal, Vol. 52, No. 9, 11/03/1993

    Get PDF
    https://dc.suffolk.edu/journal/1941/thumbnail.jp

    Active Perception by Interaction with Other Agents in a Predictive Coding Framework: Application to Internet of Things Environment

    Get PDF
    Predicting the state of an agent\u27s partially-observable environment is a problem of interest in many domains. Typically in the real world, the environment consists of multiple agents, not necessarily working towards a common goal. Though the goal and sensory observation for each agent is unique, one agent might have acquired some knowledge that may benefit the other. In essence, the knowledge base regarding the environment is distributed among the agents. An agent can sample this distributed knowledge base by communicating with other agents. Since an agent is not storing the entire knowledge base, its model can be small and its inference can be efficient and fault-tolerant. However, the agent needs to learn -- when, with whom and what -- to communicate (in general interact) under different situations.This dissertation presents an agent model that actively and selectively communicates with other agents to predict the state of its environment efficiently. Communication is a challenge when the internal models of other agents is unknown and unobservable. The proposed agent learns communication policies as mappings from its belief state to when, with whom and what to communicate. The policies are learned using predictive coding in an online manner, without any reinforcement. The proposed agent model is evaluated on widely-studied applications, such as human activity recognition from multimodal, multisource and heterogeneous sensor data, and transferring knowledge across sensor networks. In the applications, either each sensor or each sensor network is assumed to be monitored by an agent. The recognition accuracy on benchmark datasets is comparable to the state-of-the-art, even though our model has significantly fewer parameters and infers the state in a localized manner. The learned policy reduces number of communications. The agent is tolerant to communication failures and can recognize the reliability of each agent from its communication messages. To the best of our knowledge, this is the first work on learning communication policies by an agent for predicting the state of its environment
    corecore