164 research outputs found
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
We pose an active perception problem where an autonomous agent actively
interacts with a second agent with potentially adversarial behaviors. Given the
uncertainty in the intent of the other agent, the objective is to collect
further evidence to help discriminate potential threats. The main technical
challenges are the partial observability of the agent intent, the adversary
modeling, and the corresponding uncertainty modeling. Note that an adversary
agent may act to mislead the autonomous agent by using a deceptive strategy
that is learned from past experiences. We propose an approach that combines
belief space planning, generative adversary modeling, and maximum entropy
reinforcement learning to obtain a stochastic belief space policy. By
accounting for various adversarial behaviors in the simulation framework and
minimizing the predictability of the autonomous agent's action, the resulting
policy is more robust to unmodeled adversarial strategies. This improved
robustness is empirically shown against an adversary that adapts to and
exploits the autonomous agent's policy when compared with a standard
Chance-Constraint Partially Observable Markov Decision Process robust approach
Monte Carlo Planning method estimates planning horizons during interactive social exchange
Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent's preference for equity with their partner, beliefs about the partner's appetite for equity, beliefs about the partner's model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference
Monte Carlo Planning method estimates planning horizons during interactive social exchange
Reciprocating interactions represent a central feature of all human
exchanges. They have been the target of various recent experiments, with
healthy participants and psychiatric populations engaging as dyads in
multi-round exchanges such as a repeated trust task. Behaviour in such
exchanges involves complexities related to each agent's preference for equity
with their partner, beliefs about the partner's appetite for equity, beliefs
about the partner's model of their partner, and so on. Agents may also plan
different numbers of steps into the future. Providing a computationally precise
account of the behaviour is an essential step towards understanding what
underlies choices. A natural framework for this is that of an interactive
partially observable Markov decision process (IPOMDP). However, the various
complexities make IPOMDPs inordinately computationally challenging. Here, we
show how to approximate the solution for the multi-round trust task using a
variant of the Monte-Carlo tree search algorithm. We demonstrate that the
algorithm is efficient and effective, and therefore can be used to invert
observations of behavioural choices. We use generated behaviour to elucidate
the richness and sophistication of interactive inference
A Projective Simulation Scheme for Partially-Observable Multi-Agent Systems
We introduce a kind of partial observability to the projective simulation
(PS) learning method. It is done by adding a belief projection operator and an
observability parameter to the original framework of the efficiency of the PS
model. I provide theoretical formulations, network representations, and
situated scenarios derived from the invasion toy problem as a starting point
for some multi-agent PS models.Comment: 28 pages, 21 figure
Repeated Multimarket Contact with Private Monitoring: A Belief-Free Approach
This paper studies repeated games where two players play multiple duopolistic
games simultaneously (multimarket contact). A key assumption is that each
player receives a noisy and private signal about the other's actions (private
monitoring or observation errors). There has been no game-theoretic support
that multimarket contact facilitates collusion or not, in the sense that more
collusive equilibria in terms of per-market profits exist than those under a
benchmark case of one market. An equilibrium candidate under the benchmark case
is belief-free strategies. We are the first to construct a non-trivial class of
strategies that exhibits the effect of multimarket contact from the
perspectives of simplicity and mild punishment. Strategies must be simple
because firms in a cartel must coordinate each other with no communication.
Punishment must be mild to an extent that it does not hurt even the minimum
required profits in the cartel. We thus focus on two-state automaton strategies
such that the players are cooperative in at least one market even when he or
she punishes a traitor. Furthermore, we identify an additional condition
(partial indifference), under which the collusive equilibrium yields the
optimal payoff.Comment: Accepted for the 9th Intl. Symp. on Algorithmic Game Theory; An
extended version was accepted at the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20
- …