59 research outputs found

    Multiagent Inverse Reinforcement Learning via Theory of Mind Reasoning

    Full text link
    We approach the problem of understanding how people interact with each other in collaborative settings, especially when individuals know little about their teammates, via Multiagent Inverse Reinforcement Learning (MIRL), where the goal is to infer the reward functions guiding the behavior of each individual given trajectories of a team's behavior during some task. Unlike current MIRL approaches, we do not assume that team members know each other's goals a priori; rather, that they collaborate by adapting to the goals of others perceived by observing their behavior, all while jointly performing a task. To address this problem, we propose a novel approach to MIRL via Theory of Mind (MIRL-ToM). For each agent, we first use ToM reasoning to estimate a posterior distribution over baseline reward profiles given their demonstrated behavior. We then perform MIRL via decentralized equilibrium by employing single-agent Maximum Entropy IRL to infer a reward function for each agent, where we simulate the behavior of other teammates according to the time-varying distribution over profiles. We evaluate our approach in a simulated 2-player search-and-rescue operation where the goal of the agents, playing different roles, is to search for and evacuate victims in the environment. Our results show that the choice of baseline profiles is paramount to the recovery of the ground-truth rewards, and that MIRL-ToM is able to recover the rewards used by agents interacting both with known and unknown teammates.Comment: Accepted as a full paper at AAMAS202

    Probabilistic Grammars for Plan Recognition

    No full text
    A decision maker operating in the presence of a planning agent must often try to determine the plan of action driving the agent's behavior. Modeling the uncertainty inherent in planning domains provides a difficult challenge. If the domain representation includes an explicit probabilistic model, then the inference mechanism can compute a probability distribution over possible hypotheses, providing a sound, decision-theoretic basis for selecting optimal actions. The recognizer bases its conclusions on its uncertain a priori knowledge about the agent's mental state, its decision process, the world state, and the world's dynamics, which can be summarized by a probability distribution. It then uses its partial observations about the world to infer properties of the agent and its plan. This work first applies existing research in probabilistic context-free grammars (PCFGs) to specify this causal model and answer certain queries. This dissertation then presents new inference algorithms that generate a Bayesian network representation of the PCFG distribution. The new inference algorithms extend the set of possible queries to include posterior probability distributions over nonterminal symbols and flexible evidence handling that permits missing terminals and observations of nonterminals. However, the PCFG independence assumptions restrict the domains for which the methods are applicable. The second phase of this work defines a new model, the probabilistic state-dependent grammar (PSDG). A PSDG adds an explicit model of the external world and the agent's mental state to a PCFG model of plan selection. Production probabilities are conditioned on the values of these state variables, allowing domain specification to capture the effect of planning context on the selection process. The model also represents the world dynamics through state transition probabilities. A PSDG's explicit partition between plan and state variables facilitates both domain specification and inference, as illustrated by specifications of two example domains. The first is a driver model, constructed from scratch, that captures the planning process of a driver maneuvering on a highway. A second illustration, in the domain of air combat, demonstrates the translation of a pre-existing specification in another language into a roughly equivalent PSDG model. The PSDG model also provides practical inference algorithms. As in the PCFG case, algorithms can generate a Bayesian network representation of the underlying probability distribution. However, this work also presents specialized algorithms that exploit the particular independence properties of the PSDG language to maintain a more efficient summary of evidence in the form of a belief state. The final combination of the PSDG language model and algorithms extends the range of plan recognition domains for which practical probabilistic inference is possible.Ph.D.Applied SciencesComputer scienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/131760/2/9929927.pd

    Probabilistic state-dependent grammars for plan recognition

    No full text
    Techniques for plan recognition under uncertainty require a stochastic model of the plangeneration process. We introduce probabilistic state-dependent grammars (PSDGs) to represent an agent’s plan-generation process. The PSDG language model extends probabilistic contextfree grammars (PCFGs) by allowing production probabilities to depend on an explicit model of the planning agent’s internal and external state. Given a PSDG description of the plan-generation process, we can then use inference algorithms that exploit the particular independence properties of the PSDG language to efficiently answer plan-recognition queries. The combination of the PSDG language model and inference algorithms extends the range of plan-recognition domains for which practical probabilistic inference is possible, as illustrated by applications in traffic monitoring and air combat.

    Generalized Queries on Probabilistic Context-Free Grammars

    No full text
    Probabilistic context-free grammars (PCFGs) provide a simple way to represent a particular class of distributions over sentences in a context-free language. Efficient parsing algorithms for answering particular queries about a PCFG (i.e., calculating the probability of a given sentence, or finding the most likely parse) have been developed, and applied to a variety of pattern-recognition problems. We extend the class of queries that can be answered in several ways: (1) allowing missing tokens in a sentence or sentence fragment, (2) supporting queries about intermediate structure, such as the presence of particular nonterminals, and (3) flexible conditioning on a variety of types of evidence. Our method works by constructing a Bayesian network to represent the distribution of parse trees induced by a given PCFG. The network structure mirrors that of the chart in a standard parser, and is generated using a similar dynamic-programming approach. We present an algorithm for constructing Bay..
    • …
    corecore