46 research outputs found
Selecting Computations: Theory and Applications
Sequential decision problems are often approximately solvable by simulating
possible future action sequences. {\em Metalevel} decision procedures have been
developed for selecting {\em which} action sequences to simulate, based on
estimating the expected improvement in decision quality that would result from
any particular simulation; an example is the recent work on using bandit
algorithms to control Monte Carlo tree search in the game of Go. In this paper
we develop a theoretical basis for metalevel decisions in the statistical
framework of Bayesian {\em selection problems}, arguing (as others have done)
that this is more appropriate than the bandit framework. We derive a number of
basic results applicable to Monte Carlo selection problems, including the first
finite sampling bounds for optimal policies in certain cases; we also provide a
simple counterexample to the intuitive conjecture that an optimal policy will
necessarily reach a decision in all cases. We then derive heuristic
approximations in both Bayesian and distribution-free settings and demonstrate
their superiority to bandit-based heuristics in one-shot decision problems and
in Go.Comment: 10 pages, UAI 201
Analysis of different MCTS implementations of artificial intelligence for the Children of the Galaxy computer game
Monte Carlo Tree Search (MCTS) is a popular game AI algorithm that searches the state space of a game while using randomized playouts to evaluate new states. There have been many papers published about various adjustments of the original algorithm, however, work that compares multiple of these algorithms together does not seem to exist. This lack of data can make it difficult to decide which variant to use without implementing and testing them which is potentially quite time-consuming. The aim of this thesis is therefore twofold. First to create such a comparison in a specific setting and second to introduce a new variant, WP MCTS, which is based on the idea that one should be able to gather more information from a playout by taking a look at all the states encountered during its computation. For our setting, we chose battles between small armies in a 4X computer game called Children of the Galaxy. The results presented here indicate that many, though not all tested variants outperform basic MCTS in this setting. 1Monte Carlo Tree Search (MCTS) je populární algoritmus pro umělou inteligenci do počítačových her, který funguje na základě prohledávání stavového prostoru hry za použití náhodných simulací na ohodnocení nových stavů. Bylo již pub- likováno mnoho článků o různých úpravách původního algoritmu, avšak práce, které by porovnávaly vícero těchto upravených algoritmů mezi sebou patrně neex- istují. Tenhle nedostatek dat může pro vývojáře činit výběr varianty pro vlastní účely obtížným, aniž by je sám implementoval a otestoval - což však může být časově náročné. Tato práce má proto dva cíle. Prvním je porovnat různé vari- anty MCTS ve specifickém prostředí a druhým je představit novou variantu, WP MCTS, která vychází z předpokladu, že pohledem na všechny stavy projedené při simulaci by mělo být možné získat více informací než jen obodováním kon- cového stavu. Pro naše prostředí jsme si vybrali souboje mezi malými armádami ve 4X počítačové hře zvané Children of the Galaxy. Výsledky prezentované v této práci indikují, že mnohé, avšak ne všechny testované varianty podávají v tomto prostředí lepší výkon než původní algoritmus. 1Department of Software and Computer Science EducationKatedra softwaru a výuky informatikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic
Balancing exploration and exploitation: task-targeted exploration for scientific decision-making
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2022.How do we collect observational data that reveal fundamental properties of scientific phenomena? This is a key challenge in modern scientific discovery. Scientific phenomena are complex—they have high-dimensional and continuous state, exhibit chaotic dynamics, and generate noisy sensor observations. Additionally, scientific experimentation often requires significant time, money, and human effort. In the face of these challenges, we propose to leverage autonomous decision-making to augment and accelerate human scientific discovery.
Autonomous decision-making in scientific domains faces an important and classical challenge: balancing exploration and exploitation when making decisions under uncertainty. This thesis argues that efficient decision-making in real-world, scientific domains requires task-targeted exploration—exploration strategies that are tuned to a specific task. By quantifying the change in task performance due to exploratory actions, we enable decision-makers that can contend with highly uncertain real-world environments, performing exploration parsimoniously to improve task performance.
The thesis presents three novel paradigms for task-targeted exploration that are motivated by and applied to real-world scientific problems. We first consider exploration in partially observable Markov decision processes (POMDPs) and present two novel planners that leverage task-driven information measures to balance exploration and exploitation. These planners drive robots in simulation and oceanographic field trials to robustly identify plume sources and track targets with stochastic dynamics. We next consider the exploration- exploitation trade-off in online learning paradigms, a robust alternative to POMDPs when the environment is adversarial or difficult to model. We present novel online learning algorithms that balance exploitative and exploratory plays optimally under real-world constraints, including delayed feedback, partial predictability, and short regret horizons.
We use these algorithms to perform model selection for subseasonal temperature and precipitation forecasting, achieving state-of-the-art forecasting accuracy.
The human scientific endeavor is poised to benefit from our emerging capacity to integrate observational data into the process of model development and validation. Realizing the full potential of these data requires autonomous decision-makers that can contend with the inherent uncertainty of real-world scientific domains. This thesis highlights the critical role that task-targeted exploration plays in efficient scientific decision-making and proposes three novel methods to achieve task-targeted exploration in real-world oceanographic and climate science applications.This material is based upon work supported by the NSF Graduate Research Fellowship Program and a Microsoft Research PhD Fellowship, as well as the Department of Energy / National Nuclear Security Administration under Award Number DE-NA0003921, the Office of Naval Research under Award Number N00014-17-1-2072, and DARPA under Award Number HR001120C0033
Recommended from our members
Algorithmic Assurances and Self-Assessment of Competency Boundaries In Autonomous Systems
As long as autonomous systems have existed, we have created methods by which we can have an assurance that the systems are working as expected; this process has largely been ad hoc and unprincipled. With the rapid increase of technology that is more autonomous and accessible than ever before, weaknesses of the current approach for making assurances are becoming apparent. Algorithmic assurances are explicitly and formally designed behaviors and properties of autonomous systems that encourage the systems appropriate use by influencing human trust. Practically, we demonstrate the utility of algorithmic assurances in addressing the challenge of enabling an autonomous delivery vehicle to assess its own limitations, or competency boundaries. Given such a capability, an autonomous system can convey information to a user to help them appropriately delegate tasks to it. Autonomous self-assessment requires high-level meta-reasoning about the suitability of assumptions, models, algorithms, approximations, and data that collectively comprise the underlying autonomy. We apply one framework for meta-reasoning on decision-making agents, called Factorized Machine Self-Confidence (FaMSeC), to the MDP class of planning problems. We show that analysis of expected cumulative reward distributions leads to insightful FaMSeC factors that can be generalized to a wide range of autonomous decision-making problems. Data from a user study involving supervision of a simulated autonomous delivery vehicle indicate that communication of self-confidence assessments does, in fact, assist users to delegate tasks for an MDP-based robot more effectively.</p
Spartan Daily, May 6, 1987
Volume 88, Issue 62https://scholarworks.sjsu.edu/spartandaily/7587/thumbnail.jp
Suffolk Journal, Vol. 52, No. 9, 11/03/1993
https://dc.suffolk.edu/journal/1941/thumbnail.jp
Active Perception by Interaction with Other Agents in a Predictive Coding Framework: Application to Internet of Things Environment
Predicting the state of an agent\u27s partially-observable environment is a problem of interest in many domains. Typically in the real world, the environment consists of multiple agents, not necessarily working towards a common goal. Though the goal and sensory observation for each agent is unique, one agent might have acquired some knowledge that may benefit the other. In essence, the knowledge base regarding the environment is distributed among the agents. An agent can sample this distributed knowledge base by communicating with other agents. Since an agent is not storing the entire knowledge base, its model can be small and its inference can be efficient and fault-tolerant. However, the agent needs to learn -- when, with whom and what -- to communicate (in general interact) under different situations.This dissertation presents an agent model that actively and selectively communicates with other agents to predict the state of its environment efficiently. Communication is a challenge when the internal models of other agents is unknown and unobservable. The proposed agent learns communication policies as mappings from its belief state to when, with whom and what to communicate. The policies are learned using predictive coding in an online manner, without any reinforcement. The proposed agent model is evaluated on widely-studied applications, such as human activity recognition from multimodal, multisource and heterogeneous sensor data, and transferring knowledge across sensor networks. In the applications, either each sensor or each sensor network is assumed to be monitored by an agent. The recognition accuracy on benchmark datasets is comparable to the state-of-the-art, even though our model has significantly fewer parameters and infers the state in a localized manner. The learned policy reduces number of communications. The agent is tolerant to communication failures and can recognize the reliability of each agent from its communication messages. To the best of our knowledge, this is the first work on learning communication policies by an agent for predicting the state of its environment
Recommended from our members
Novel approaches to MRI of glioma
Gliomas are extremely heterogeneous, both morphologically and biologically, which contributes to a very poor prognosis. Current imaging of glioma is insufficient for a thorough diagnosis, therapy assessment and prognosis prediction. Moreover, refined and more sophisticated imaging technique could help in furthering our knowledge of gliomas.
In order to facilitate proliferation, cancer cells undergo a change in structure and an increase in metabolism that results in distortion and disruption of tissue architecture. Gliomas are characterised by an increase in cells of variable sizes, as well as changes in the tissue microstructure. Diffusion-Weighted Imaging (DWI) and the apparent diffusion coefficient (ADC), have been extensively studied as potential imaging biomarkers for cellularity and tissue architecture. However, several studies have shown partial overlap in the measured values between tumour subtypes. Moreover, ADC is influenced by several factors and does not provide detailed information on the tissue microstructure. The Vascular, Extracellular and Restricted Diffusion for Cytometry in Tumours (VERDICT) is a novel diffusion model that infers tissue microstructure compartment from conventional DWI measurements. This model derives metrics for the intracellular, intravascular and extracellular– extravascular spaces providing a more detailed interpretation of the tissue microstructure. To date, VERDICT has been applied to xenograft models of colorectal cancer, patient studies of prostate cancer and recently its feasibility in glioma has been shown. In this PhD I have applied a shortened version of the VERDICT method to image intratumoral and intertumoral heterogeneity in glioma. The results have also been validated with histology as part of a prospective study.
Gliomas also exhibit a significant increase in mitotic activity within the tumour. The increased number of mitosis alters cell density which, in turn, affects the total concentration of tissue sodium as the concentration of tissue sodium is approximately ten-fold higher in the extracellular compared to the intracellular space. In addition, there is a decrease in Na+/K+-ATPase activity in tumours due to ATP depletion, which contributes to disturb sodium homeostasis. Non-invasive detection of 23Na with MRI has the potential to quantify sodium concentration and therefore could be an imaging probe of cell morphology and membrane function within the tumour microenvironment, as well as a method of probing tissue heterogeneity. During my PhD, a novel 23Na-MRI technique has been used to evaluate sodium distribution within glioma and in the surrounding tissue.
Metabolic reprogramming is one of the major driving forces for determining glioma growth and invasion. Therefore, the non-invasive characterization of metabolic intratumoral, peritumoral and intertumoral heterogeneity in vivo could help to better stratify patients and to develop novel therapeutic strategies targeting cancer-specific metabolic pathways. 13C magnetic resonance imaging (MRI) using dynamic nuclear polarization (DNP) is a novel technique that allows non-invasive assessment of the metabolism of hyperpolarized (HP) 13C-labelled molecules in vivo, such as the exchange of [1-13C]pyruvate to [1-13C]lactate in tumours (Warburg effect). Part of my PhD has focused on developing and translating HP [1-13C]pyruvate MRI to explore metabolic reprogramming in glioma and the surrounding microenvironment.
The overall aim of my PhD has been to develop novel approaches to imaging glioma with MRI to probe both the architectural and metabolic changes of Glioma. The preliminary evidence suggests that these tools can more deeply phenotype tumours than conventional imaging approaches. Although the main focus of this work has been gliomas, the techniques developed and presented here may be applied to study other pathological conditions within the brain, which raises the possibility of other potential clinical applications for this work