165,910 research outputs found
Changing minds: Children's inferences about third party belief revision
By the age of 5, children explicitly represent that agents can have both true and false beliefs based on epistemic access to information (e.g., Wellman, Cross, & Watson, 2001). Children also begin to understand that agents can view identical evidence and draw different inferences from it (e.g., Carpendale & Chandler, 1996). However, much less is known about when, and under what conditions, children expect other agents to change their minds. Here, inspired by formal ideal observer models of learning, we investigate children's expectations of the dynamics that underlie third parties' belief revision. We introduce an agent who has prior beliefs about the location of a population of toys and then observes evidence that, from an ideal observer perspective, either does, or does not justify revising those beliefs. We show that children's inferences on behalf of third parties are consistent with the ideal observer perspective, but not with a number of alternative possibilities, including that children expect other agents to be influenced only by their prior beliefs, only by the sampling process, or only by the observed data. Rather, children integrate all three factors in determining how and when agents will update their beliefs from evidence.National Science Foundation (U.S.). Division of Computing and Communication Foundations (1231216)National Science Foundation (U.S.). Division of Research on Learning in Formal and Informal Settings (0744213)National Science Foundation (U.S.) (STC Center for Brains, Minds and Machines Award CCF-1231216)National Science Foundation (U.S.) (0744213
Trajectory Modeling via Random Utility Inverse Reinforcement Learning
We consider the problem of modeling trajectories of drivers in a road network
from the perspective of inverse reinforcement learning. Cars are detected by
sensors placed on sparsely distributed points on the street network of a city.
As rational agents, drivers are trying to maximize some reward function unknown
to an external observer. We apply the concept of random utility from
econometrics to model the unknown reward function as a function of observed and
unobserved features. In contrast to current inverse reinforcement learning
approaches, we do not assume that agents act according to a stochastic policy;
rather, we assume that agents act according to a deterministic optimal policy
and show that randomness in data arises because the exact rewards are not fully
observed by an external observer. We introduce the concept of extended state to
cope with unobserved features and develop a Markov decision process formulation
of drivers decisions. We present theoretical results which guarantee the
existence of solutions and show that maximum entropy inverse reinforcement
learning is a particular case of our approach. Finally, we illustrate Bayesian
inference on model parameters through a case study with real trajectory data
from a large city in Brazil.Comment: 31 pages; expanded version, with the addition of proofs not present
in the first versio
Changing minds: Children’s inferences about third party belief revision
By the age of 5, children explicitly represent that agents can have both true and false beliefs based on epistemic access to information (e.g., Wellman, Cross, & Watson, 2001). Children also begin to understand that agents can view identical evidence and draw different inferences from it (e.g., Carpendale & Chandler, 1996). However, much less is known about when, and under what conditions, children expect other agents to change their minds. Here, inspired by formal ideal observer models of learning, we investigate children’s expectations of the dynamics that underlie third parties’ belief revision. We introduce an agent who has prior beliefs about the location of a population of toys and then observes evidence that, from an ideal observer perspective, either does, or does not justify revising those beliefs. We show that children’s inferences on behalf of third parties are consistent with the ideal observer perspective, but not with a number of alternative possibilities, including that children expect other agents to be influenced only by their prior beliefs, only by the sampling process, or only by the observed data. Rather, children integrate all three factors in determining how and when agents will update their beliefs from evidence.Young children use others’ prior beliefs and data to predict when third parties will retain their beliefs and when they will change their minds.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/142970/1/desc12553_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142970/2/desc12553.pd
A learning-based approach to multi-agent decision-making
We propose a learning-based methodology to reconstruct private information
held by a population of interacting agents in order to predict an exact outcome
of the underlying multi-agent interaction process, here identified as a
stationary action profile. We envision a scenario where an external observer,
endowed with a learning procedure, is allowed to make queries and observe the
agents' reactions through private action-reaction mappings, whose collective
fixed point corresponds to a stationary profile. By adopting a smart query
process to iteratively collect sensible data and update parametric estimates,
we establish sufficient conditions to assess the asymptotic properties of the
proposed learning-based methodology so that, if convergence happens, it can
only be towards a stationary action profile. This fact yields two main
consequences: i) learning locally-exact surrogates of the action-reaction
mappings allows the external observer to succeed in its prediction task, and
ii) working with assumptions so general that a stationary profile is not even
guaranteed to exist, the established sufficient conditions hence act also as
certificates for the existence of such a desirable profile. Extensive numerical
simulations involving typical competitive multi-agent control and decision
making problems illustrate the practical effectiveness of the proposed
learning-based approach
Neural computations underlying inverse reinforcement learning in the human brain
In inverse reinforcement learning an observer infers the reward distribution available for actions in the environment solely through observing the actions implemented by another agent. To address whether this computational process is implemented in the human brain, participants underwent fMRI while learning about slot machines yielding hidden preferred and non-preferred food outcomes with varying probabilities, through observing the repeated slot choices of agents with similar and dissimilar food preferences. Using formal model comparison, we found that participants implemented inverse RL as opposed to a simple imitation strategy, in which the actions of the other agent are copied instead of inferring the underlying reward structure of the decision problem. Our computational fMRI analysis revealed that anterior dorsomedial prefrontal cortex encoded inferences about action-values within the value space of the agent as opposed to that of the observer, demonstrating that inverse RL is an abstract cognitive process divorceable from the values and concerns of the observer him/herself
Cooperative Adaptive Learning Control for a Group of Nonholonomic UGVs by Output Feedback
A high-gain observer-based cooperative deterministic learning (CDL) control algorithm is proposed in this chapter for a group of identical unicycle-type unmanned ground vehicles (UGVs) to track over desired reference trajectories. For the vehicle states, the positions of the vehicles can be measured, while the velocities are estimated using the high-gain observer. For the trajectory tracking controller, the radial basis function (RBF) neural network (NN) is used to online estimate the unknown dynamics of the vehicle, and the NN weight convergence and estimation accuracy is guaranteed by CDL. The major challenge and novelty of this chapter is to track the reference trajectory using this observer-based CDL algorithm without the full knowledge of the vehicle state and vehicle model. In addition, any vehicle in the system is able to learn the knowledge of unmodeled dynamics along the union of trajectories experienced by all vehicle agents, such that the learned knowledge can be re-used to follow any reference trajectory defined in the learning phase. The learning-based tracking convergence and consensus learning results, as well as using learned knowledge for tracking experienced trajectories, are shown using the Lyapunov method. Simulation is given to show the effectiveness of this algorithm
- …