165,910 research outputs found

    Changing minds: Children's inferences about third party belief revision

    Get PDF
    By the age of 5, children explicitly represent that agents can have both true and false beliefs based on epistemic access to information (e.g., Wellman, Cross, & Watson, 2001). Children also begin to understand that agents can view identical evidence and draw different inferences from it (e.g., Carpendale & Chandler, 1996). However, much less is known about when, and under what conditions, children expect other agents to change their minds. Here, inspired by formal ideal observer models of learning, we investigate children's expectations of the dynamics that underlie third parties' belief revision. We introduce an agent who has prior beliefs about the location of a population of toys and then observes evidence that, from an ideal observer perspective, either does, or does not justify revising those beliefs. We show that children's inferences on behalf of third parties are consistent with the ideal observer perspective, but not with a number of alternative possibilities, including that children expect other agents to be influenced only by their prior beliefs, only by the sampling process, or only by the observed data. Rather, children integrate all three factors in determining how and when agents will update their beliefs from evidence.National Science Foundation (U.S.). Division of Computing and Communication Foundations (1231216)National Science Foundation (U.S.). Division of Research on Learning in Formal and Informal Settings (0744213)National Science Foundation (U.S.) (STC Center for Brains, Minds and Machines Award CCF-1231216)National Science Foundation (U.S.) (0744213

    Trajectory Modeling via Random Utility Inverse Reinforcement Learning

    Full text link
    We consider the problem of modeling trajectories of drivers in a road network from the perspective of inverse reinforcement learning. Cars are detected by sensors placed on sparsely distributed points on the street network of a city. As rational agents, drivers are trying to maximize some reward function unknown to an external observer. We apply the concept of random utility from econometrics to model the unknown reward function as a function of observed and unobserved features. In contrast to current inverse reinforcement learning approaches, we do not assume that agents act according to a stochastic policy; rather, we assume that agents act according to a deterministic optimal policy and show that randomness in data arises because the exact rewards are not fully observed by an external observer. We introduce the concept of extended state to cope with unobserved features and develop a Markov decision process formulation of drivers decisions. We present theoretical results which guarantee the existence of solutions and show that maximum entropy inverse reinforcement learning is a particular case of our approach. Finally, we illustrate Bayesian inference on model parameters through a case study with real trajectory data from a large city in Brazil.Comment: 31 pages; expanded version, with the addition of proofs not present in the first versio

    Changing minds: Children’s inferences about third party belief revision

    Full text link
    By the age of 5, children explicitly represent that agents can have both true and false beliefs based on epistemic access to information (e.g., Wellman, Cross, & Watson, 2001). Children also begin to understand that agents can view identical evidence and draw different inferences from it (e.g., Carpendale & Chandler, 1996). However, much less is known about when, and under what conditions, children expect other agents to change their minds. Here, inspired by formal ideal observer models of learning, we investigate children’s expectations of the dynamics that underlie third parties’ belief revision. We introduce an agent who has prior beliefs about the location of a population of toys and then observes evidence that, from an ideal observer perspective, either does, or does not justify revising those beliefs. We show that children’s inferences on behalf of third parties are consistent with the ideal observer perspective, but not with a number of alternative possibilities, including that children expect other agents to be influenced only by their prior beliefs, only by the sampling process, or only by the observed data. Rather, children integrate all three factors in determining how and when agents will update their beliefs from evidence.Young children use others’ prior beliefs and data to predict when third parties will retain their beliefs and when they will change their minds.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/142970/1/desc12553_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142970/2/desc12553.pd

    A learning-based approach to multi-agent decision-making

    Full text link
    We propose a learning-based methodology to reconstruct private information held by a population of interacting agents in order to predict an exact outcome of the underlying multi-agent interaction process, here identified as a stationary action profile. We envision a scenario where an external observer, endowed with a learning procedure, is allowed to make queries and observe the agents' reactions through private action-reaction mappings, whose collective fixed point corresponds to a stationary profile. By adopting a smart query process to iteratively collect sensible data and update parametric estimates, we establish sufficient conditions to assess the asymptotic properties of the proposed learning-based methodology so that, if convergence happens, it can only be towards a stationary action profile. This fact yields two main consequences: i) learning locally-exact surrogates of the action-reaction mappings allows the external observer to succeed in its prediction task, and ii) working with assumptions so general that a stationary profile is not even guaranteed to exist, the established sufficient conditions hence act also as certificates for the existence of such a desirable profile. Extensive numerical simulations involving typical competitive multi-agent control and decision making problems illustrate the practical effectiveness of the proposed learning-based approach

    Neural computations underlying inverse reinforcement learning in the human brain

    Get PDF
    In inverse reinforcement learning an observer infers the reward distribution available for actions in the environment solely through observing the actions implemented by another agent. To address whether this computational process is implemented in the human brain, participants underwent fMRI while learning about slot machines yielding hidden preferred and non-preferred food outcomes with varying probabilities, through observing the repeated slot choices of agents with similar and dissimilar food preferences. Using formal model comparison, we found that participants implemented inverse RL as opposed to a simple imitation strategy, in which the actions of the other agent are copied instead of inferring the underlying reward structure of the decision problem. Our computational fMRI analysis revealed that anterior dorsomedial prefrontal cortex encoded inferences about action-values within the value space of the agent as opposed to that of the observer, demonstrating that inverse RL is an abstract cognitive process divorceable from the values and concerns of the observer him/herself

    Cooperative Adaptive Learning Control for a Group of Nonholonomic UGVs by Output Feedback

    Get PDF
    A high-gain observer-based cooperative deterministic learning (CDL) control algorithm is proposed in this chapter for a group of identical unicycle-type unmanned ground vehicles (UGVs) to track over desired reference trajectories. For the vehicle states, the positions of the vehicles can be measured, while the velocities are estimated using the high-gain observer. For the trajectory tracking controller, the radial basis function (RBF) neural network (NN) is used to online estimate the unknown dynamics of the vehicle, and the NN weight convergence and estimation accuracy is guaranteed by CDL. The major challenge and novelty of this chapter is to track the reference trajectory using this observer-based CDL algorithm without the full knowledge of the vehicle state and vehicle model. In addition, any vehicle in the system is able to learn the knowledge of unmodeled dynamics along the union of trajectories experienced by all vehicle agents, such that the learned knowledge can be re-used to follow any reference trajectory defined in the learning phase. The learning-based tracking convergence and consensus learning results, as well as using learned knowledge for tracking experienced trajectories, are shown using the Lyapunov method. Simulation is given to show the effectiveness of this algorithm
    • …
    corecore