2,238 research outputs found

    The Trust-Based Interactive Partially Observable Markov Decision Process

    Get PDF
    Cooperative agent and robot systems are designed so that each is working toward the same common good. The problem is that the software systems are extremely complex and can be subverted by an adversary to either break the system or potentially worse, create sneaky agents who are willing to cooperate when the stakes are low and take selfish, greedy actions when the rewards rise. This research focuses on the ability of a group of agents to reason about the trustworthiness of each other and make decisions about whether to cooperate. A trust-based interactive partially observable Markov decision process (TI-POMDP) is developed to model the trust interactions between agents, enabling the agents to select the best course of action from the current state. The TI-POMDP is a novel approach to multiagent cooperation based on an interactive partially observable Markov decision process (I-POMDP) augmented with trust relationships. Experiments using the Defender simulation demonstrate the TI-POMDP\u27s ability to accurately track the trust levels of agents with hidden agendas The TI-POMDP provides agents with the information needed to make decisions based on their level of trust and model of the environment. Testing demonstrates that agents quickly identify the hidden trust levels and mitigate the impact of a deceitful agent in comparison with a trust vector model. Agents using the TI-POMDP model achieved 3.8 times the average reward of agents using a trust vector model

    Distributed Synthesis in Continuous Time

    Get PDF
    We introduce a formalism modelling communication of distributed agents strictly in continuous-time. Within this framework, we study the problem of synthesising local strategies for individual agents such that a specified set of goal states is reached, or reached with at least a given probability. The flow of time is modelled explicitly based on continuous-time randomness, with two natural implications: First, the non-determinism stemming from interleaving disappears. Second, when we restrict to a subclass of non-urgent models, the quantitative value problem for two players can be solved in EXPTIME. Indeed, the explicit continuous time enables players to communicate their states by delaying synchronisation (which is unrestricted for non-urgent models). In general, the problems are undecidable already for two players in the quantitative case and three players in the qualitative case. The qualitative undecidability is shown by a reduction to decentralized POMDPs for which we provide the strongest (and rather surprising) undecidability result so far

    Sample Efficient Bayesian Reinforcement Learning

    Get PDF
    Artificial Intelligence (AI) has been an active field of research for over a century now. The research field of AI may be grouped into various tasks that are expected from an intelligent agent; two major ones being learning & inference and planning. The act of storing new knowledge is known as learning while inference refers to the act to extracting conclusions given agent’s limited knowledge base. They are tightly knit by the design of its knowledge base. The process of deciding long-term actions or plans given its current knowledge is called planning.Reinforcement Learning (RL) brings together these two tasks by posing a seemingly benign question “How to act optimally in an unknown environment?”. This requires the agent to learn about its environment as well as plan actions given its current knowledge about it. In RL, the environment can be represented by a mathematical model and we associate an intrinsic value to the actions that the agent may choose.In this thesis, we present a novel Bayesian algorithm for the problem of RL. Bayesian RL is a widely explored area of research but is constrained by scalability and performance issues. We provide first steps towards rigorous analysis of these types of algorithms. Bayesian algorithms are characterized by the belief that they maintain over their unknowns; which is updated based on the collected evidence. This is different from the traditional approach in RL in terms of problem formulation and formal guarantees. Our novel algorithm combines aspects of planning and learning due to its inherent Bayesian formulation. It does so in a more scalable fashion, with formal PAC guarantees. We also give insights on the application of Bayesian framework for the estimation of model and value, in a joint work on Bayesian backward induction for RL

    PLGRIM: Hierarchical Value Learning for Large-scale Exploration in Unknown Environments

    Full text link
    In order for an autonomous robot to efficiently explore an unknown environment, it must account for uncertainty in sensor measurements, hazard assessment, localization, and motion execution. Making decisions for maximal reward in a stochastic setting requires value learning and policy construction over a belief space, i.e., probability distribution over all possible robot-world states. However, belief space planning in a large spatial environment over long temporal horizons suffers from severe computational challenges. Moreover, constructed policies must safely adapt to unexpected changes in the belief at runtime. This work proposes a scalable value learning framework, PLGRIM (Probabilistic Local and Global Reasoning on Information roadMaps), that bridges the gap between (i) local, risk-aware resiliency and (ii) global, reward-seeking mission objectives. Leveraging hierarchical belief space planners with information-rich graph structures, PLGRIM addresses large-scale exploration problems while providing locally near-optimal coverage plans. We validate our proposed framework with high-fidelity dynamic simulations in diverse environments and on physical robots in Martian-analog lava tubes
    • …
    corecore