164 research outputs found

    Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

    Full text link
    We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach

    Monte Carlo Planning method estimates planning horizons during interactive social exchange

    Get PDF
    Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent's preference for equity with their partner, beliefs about the partner's appetite for equity, beliefs about the partner's model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference

    Monte Carlo Planning method estimates planning horizons during interactive social exchange

    Full text link
    Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent's preference for equity with their partner, beliefs about the partner's appetite for equity, beliefs about the partner's model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference

    A Projective Simulation Scheme for Partially-Observable Multi-Agent Systems

    Full text link
    We introduce a kind of partial observability to the projective simulation (PS) learning method. It is done by adding a belief projection operator and an observability parameter to the original framework of the efficiency of the PS model. I provide theoretical formulations, network representations, and situated scenarios derived from the invasion toy problem as a starting point for some multi-agent PS models.Comment: 28 pages, 21 figure

    Repeated Multimarket Contact with Private Monitoring: A Belief-Free Approach

    Full text link
    This paper studies repeated games where two players play multiple duopolistic games simultaneously (multimarket contact). A key assumption is that each player receives a noisy and private signal about the other's actions (private monitoring or observation errors). There has been no game-theoretic support that multimarket contact facilitates collusion or not, in the sense that more collusive equilibria in terms of per-market profits exist than those under a benchmark case of one market. An equilibrium candidate under the benchmark case is belief-free strategies. We are the first to construct a non-trivial class of strategies that exhibits the effect of multimarket contact from the perspectives of simplicity and mild punishment. Strategies must be simple because firms in a cartel must coordinate each other with no communication. Punishment must be mild to an extent that it does not hurt even the minimum required profits in the cartel. We thus focus on two-state automaton strategies such that the players are cooperative in at least one market even when he or she punishes a traitor. Furthermore, we identify an additional condition (partial indifference), under which the collusive equilibrium yields the optimal payoff.Comment: Accepted for the 9th Intl. Symp. on Algorithmic Game Theory; An extended version was accepted at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20
    • …
    corecore