290 research outputs found

    Accelerating Empowerment Computation with UCT Tree Search

    Get PDF
    Models of intrinsic motivation present an important means to produce sensible behaviour in the absence of extrinsic rewards. Applications in video games are varied, and range from intrinsically motivated general game-playing agents to non-player characters such as companions and enemies. The information-theoretic quantity of Empowerment is a particularly promising candidate motivation to produce believable, generic and robust behaviour. However, while it can be used in the absence of external reward functions that would need to be crafted and learned, empowerment is computationally expensive. In this paper, we propose a modified UCT tree search method to mitigate empowerment's computational complexity in discrete and deterministic scenarios. We demonstrate how to modify a Monte-Carlo Search Tree with UCT to realise empowerment maximisation, and discuss three additional modifications that facilitate better sampling. We evaluate the approach both quantitatively, by analysing how close our approach gets to the baseline of exhaustive empowerment computation with varying amounts of computational resources, and qualitatively, by analysing the resulting behaviour in a Minecraft-like scenario

    Accelerating Empowerment Computation with UCT Tree Search

    Get PDF
    Models of intrinsic motivation present an important means to produce sensible behaviour in the absence of extrinsic rewards. Applications in video games are varied, and range from intrinsically motivated general game-playing agents to non-player characters such as companions and enemies. The information-theoretic quantity of Empowerment is a particularly promising candidate motivation to produce believable, generic and robust behaviour. However, while it can be used in the absence of external reward functions that would need to be crafted and learned, empowerment is computationally expensive. In this paper, we propose a modified UCT tree search method to mitigate empowerment's computational complexity in discrete and deterministic scenarios. We demonstrate how to modify a Monte-Carlo Search Tree with UCT to realise empowerment maximisation, and discuss three additional modifications that facilitate better sampling. We evaluate the approach both quantitatively, by analysing how close our approach gets to the baseline of exhaustive empowerment computation with varying amounts of computational resources, and qualitatively, by analysing the resulting behaviour in a Minecraft-like scenario

    An Information-Motivated Exploration Agent to Locate Stationary Persons with Wireless Transmitters in Unknown Environments

    Get PDF
    Unmanned Aerial Vehicles (UAVs) show promise in a variety of applications and recently were explored in the area of Search and Rescue (SAR) for finding victims. In this paper we consider the problem of finding multiple unknown stationary transmitters in a discrete simulated unknown environment, where the goal is to locate all transmitters in as short a time as possible. Existing solutions in the UAV search space typically search for a single target, assume a simple environment, assume target properties are known or have other unrealistic assumptions. We simulate large, complex environments with limited a priori information about the environment and transmitter properties. We propose a Bayesian search algorithm, Information Exploration Behaviour (IEB), that maximizes predicted information gain at each search step, incorporating information from multiple sensors whilst making minimal assumptions about the scenario. This search method is inspired by the information theory concept of empowerment. Our algorithm shows significant speed-up compared to baseline algorithms, being orders of magnitude faster than a random agent and 10 times faster than a lawnmower strategy, even in complex scenarios. The IEB agent is able to make use of received transmitter signals from unknown sources and incorporate both an exploration and search strategy

    Object-Oriented Dynamics Learning through Multi-Level Abstraction

    Full text link
    Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial Intelligence (AAAI), 202

    Future state maximisation as an intrinsic motivation for decision making

    Get PDF
    The concept of an “intrinsic motivation" is used in the psychology literature to distinguish between behaviour which is motivated by the expectation of an immediate, quantifiable reward (“extrinsic motivation") and behaviour which arises because it is inherently useful, interesting or enjoyable. Examples of the latter can include curiosity driven behaviour such as exploration and the accumulation of knowledge, as well as developing skills that might not be immediately useful but that have the potential to be re-used in a variety of different future situations. In this thesis, we examine a candidate for an intrinsic motivation with wide-ranging applicability which we refer to as “future state maximisation". Loosely speaking this is the idea that, taking everything else to be equal, decisions should be made so as to maximally keep one's options open, or to give the maximal amount of control over what one can potentially do in the future. Our goal is to study how this principle can be applied in a quantitative manner, as well as identifying examples of systems where doing so could be useful in either explaining or generating behaviour. We consider a number of examples, however our primary application is to a model of collective motion in which we consider a group of agents equipped with simple visual sensors, moving around in two dimensions. In this model, agents aim to make decisions about how to move so as to maximise the amount of control they have over the potential visual states that they can access in the future. We find that with each agent following this simple, low-level motivational principle a swarm spontaneously emerges in which the agents exhibit rich collective behaviour, remaining cohesive and highly-aligned. Remarkably, the emergent swarm also shares a number of features which are observed in real flocks of starlings, including scale free correlations and marginal opacity. We go on to explore how the model can be developed to allow us to manipulate and control the swarm, as well as looking at heuristics which are able to mimic future state maximisation whilst requiring significantly less computation, and so which could plausibly operate under animal cognition

    Automated Gadget Discovery in Science

    Full text link
    In recent years, reinforcement learning (RL) has become increasingly successful in its application to science and the process of scientific discovery in general. However, while RL algorithms learn to solve increasingly complex problems, interpreting the solutions they provide becomes ever more challenging. In this work, we gain insights into an RL agent's learned behavior through a post-hoc analysis based on sequence mining and clustering. Specifically, frequent and compact subroutines, used by the agent to solve a given task, are distilled as gadgets and then grouped by various metrics. This process of gadget discovery develops in three stages: First, we use an RL agent to generate data, then, we employ a mining algorithm to extract gadgets and finally, the obtained gadgets are grouped by a density-based clustering algorithm. We demonstrate our method by applying it to two quantum-inspired RL environments. First, we consider simulated quantum optics experiments for the design of high-dimensional multipartite entangled states where the algorithm finds gadgets that correspond to modern interferometer setups. Second, we consider a circuit-based quantum computing environment where the algorithm discovers various gadgets for quantum information processing, such as quantum teleportation. This approach for analyzing the policy of a learned agent is agent and environment agnostic and can yield interesting insights into any agent's policy

    Deep reinforcement learning in robotics and dialog systems

    Get PDF

    U+16E99

    Get PDF
    The general understanding and professional practice of graphic design have been shaped by the perspectives, needs, and desires of white, cis-gendered, heterosexual men in imperialist, capitalist societies. The tools, substrates, professional networks, institutions, processes, theories, grammars, and values that have come to define the discipline have been formed from this position. Consequently, graphic design primarily serves the needs of the settler in settler colonial regimes like the United States. This reality has prompted many designers like myself who come from colonized communities or whose identity troubles this rubric to question the framework of the discipline and our position within it. My thesis is rooted within this broader inquiry, which for me, as a Black and Indigenous person, began a few years ago through the emergence of two decolonial movements in the communities I call home: the BlackLivesMatter movement in Minneapolis following the live-streamed extra-judicial killing of Philando Castile by a White police officer; and the NoDAPL movement in Standing Rock which sought to prevent the construction of an oil pipeline across the river my tribe depends upon for water. The inquiries that evolved from the social and political contexts in which I began my formal design education have particular salience now amidst current manifestations of colonial oppression: a deadly global pandemic that has disproportionately claimed the lives of Black and Indigenous people due to the violence of structural inequities in the United States; the resurgence of the Keystone XL oil pipeline threatening the ecological sovereignty and well-being of numerous indigenous communities in the Midwest, including my own; and the nation wide uprisings sparked by the extra-judicial killing of George Floyd by Minneapolis police. As I write this, I can hear the whir of police and military helicopters surveilling the streets of Providence for protesters out past state-mandated curfew. My background and the urgent socio-political contexts surrounding my design education have forced me to seek out creative and subversive methodologies to bend a design discipline defined for the service of settler colonialism towards ongoing decolonial movements in Black and Indigenous communities. Using design in the service of decolonial movements will require new articulations of tools, substrates, networks, institutions, processes, theories, grammars, and values. Fortunately, there is a long tradition to draw from in marginalized communities of repurposing tools not designed for us to meet our own needs. Decolonization is not a destination along a binary array. Rather, it is a vector traversed through a lifelong practice seeking what lies beyond the decolonial horizon. In a decolonial design practice, design and the products of design are not an end; for endeavors with no end, process is the product. Design is the work that leads to and through the personal, interpersonal, and systemic work of decolonization. A radical design practice uses craft as a vehicle for the beyond, one of many possible methods that activate the decolonial moments, gestures, and utterances between people that triangulate new vectors for our collective liberation and help carry us there. As such, rather than catalog design works, the images in this publication utter a personal narrative of formative moments that transpired through the work of design. I am the work design helps make. U+16E99 is one articulation of a decolonial design practice uttered through the poetic grammars of Black, Indigenous, Queer, and Feminist thinkers, makers, and organizers. It is an attempt to define a trajectory for my own creative practice that centers my values, needs, and desires, while navigating the demands, precarities, and limitations of the academic institutions and settler colonial contexts in which this mapping takes place

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp

    Service design : on the evolution of design expertise

    Get PDF
    corecore