10,793 research outputs found

    Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

    Get PDF
    Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multi-objective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective's estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique's decisions, yielding insights into the nature of the problems being solved

    Judgments of effort exerted by others are influenced by received rewards

    Get PDF
    Estimating invested effort is a core dimension for evaluating own and others’ actions, and views on the relationship between effort and rewards are deeply ingrained in various societal attitudes. Internal representations of effort, however, are inherently noisy, e.g. due to the variability of sensorimotor and visceral responses to physical exertion. The uncertainty in effort judgments is further aggravated when there is no direct access to the internal representations of exertion – such as when estimating the effort of another person. Bayesian cue integration suggests that this uncertainty can be resolved by incorporating additional cues that are predictive of effort, e.g. received rewards. We hypothesized that judgments about the effort spent on a task will be influenced by the magnitude of received rewards. Additionally, we surmised that such influence might further depend on individual beliefs regarding the relationship between hard work and prosperity, as exemplified by a conservative work ethic. To test these predictions, participants performed an effortful task interleaved with a partner and were informed about the obtained reward before rating either their own or the partner’s effort. We show that higher rewards led to higher estimations of exerted effort in self-judgments, and this effect was even more pronounced for other-judgments. In both types of judgment, computational modelling revealed that reward information and sensorimotor markers of exertion were combined in a Bayes-optimal manner in order to reduce uncertainty. Remarkably, the extent to which rewards influenced effort judgments was associated with conservative world-views, indicating links between this phenomenon and general beliefs about the relationship between effort and earnings in society

    Reinforcement Learning from Demonstration

    Get PDF
    Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent\u27s own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently

    Using informative behavior to increase engagement while learning from human reward

    Get PDF
    In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent’s non-task behavior can affect a human trainer’s training and agent learning. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent’s actions, as the foundation for our investigation. Then, starting from the premise that the interaction between the agent and the trainer should be bi-directional, we propose two new training interfaces to increase a human trainer’s active involvement in the training process and thereby improve the agent’s task performance. One provides information on the agent’s uncertainty which is a metric calculated as data coverage, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent’s performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. These results suggest that the organizational maxim about human behavior, “you get what you measure”—i.e., sharing metrics with people causes them to focus on optimizing those metrics while de-emphasizing other objectives—also applies to the training of agents. Using principle component analysis, we show how trainers in the two conditions train agents differently. In addition, by simulating the influence of the agent’s uncertainty–informative behavior on a human’s training behavior, we show that trainers could be distracted by the agent sharing its uncertainty levels about its actions, giving poor feedback for the sake of reducing the agent’s uncertainty without improving the agent’s performance

    Fruit scent and observer colour vision shape food-selection strategies in wild capuchin monkeys

    Full text link
    The senses play critical roles in helping animals evaluate foods, including fruits that can change both in colour and scent during ripening to attract frugivores. Although numerous studies have assessed the impact of colour on fruit selection, comparatively little is known about fruit scent and how olfactory and visual data are integrated during foraging. We combine 25 months of behavioural data on 75 wild, white-faced capuchins (Cebus imitator) with measurements of fruit colours and scents from 18 dietary plant species. We show that frequency of fruit-directed olfactory behaviour is positively correlated with increases in the volume of fruit odours produced during ripening. Monkeys with red-green colour blindness sniffed fruits more often, indicating that increased reliance on olfaction is a behavioural strategy that mitigates decreased capacity to detect red-green colour contrast. These results demonstrate a complex interaction among fruit traits, sensory capacities and foraging strategies, which help explain variation in primate behaviour.https://www.nature.com/articles/s41467-019-10250-9Published versio

    Integrative function in rat visual system

    Get PDF
    A vital function of the brain is to acquire information about the events in the environment and to respond appropriately. The brain needs to integrate the incoming information from multiple senses to improve the quality of the sensory signal. It also needs to be able to distribute the processing resources to optimise the integration across modalities based on the reliability and salience of the incoming signals. This thesis aimed to investigate two aspects of the way in which the brain integrates information from the external environment: multisensory integration and selective attention. The hooded rat was used as the experimental animal model. In Chapter 2 of this thesis, I investigate the multisensory properties of neurons in superior colliculus (SC), a midbrain structure involved in attentive and orienting behaviours. I first establish that in rat SC, spiking activity is elevated by whisker or visual stimuli, but rarely both, when those stimuli are presented in isolation. I then show that visually responsive sites are mainly found in superficial layers whereas whisker responsive sites were in intermediate layers. Finally I show that there are robust suppressive interactions between these two modalities. In Chapter 3, I develop a rodent behavioural paradigm that can easily be paired with electrophysiological measurements. The design is adaptable to a variety of detection and discrimination tasks. Head position is restricted in the central nose-poke without head-fixation and the eyes can be constantly monitored via video camera. In Chapter 4, I ask whether selective spatial visual attention can be demonstrated in rats utilising the paradigms developed in Chapter 3. Selective attention is the process by which brain focuses on significant external events. Does being able to predict the likely side of the stimulus modulate the speed and accuracy of stimulus detection? To address this question, I varied the probability with which the signal was presented on left or right screen. My results suggest that rats have the capacity for spatial attention engaged by top-down mechanisms that have access to the predictability of stimulus location. In summary, my thesis presents a paradigm to study visual behaviour, multisensory integration and selective spatial attention in rats. Over the last decade, rats have gained popularity as a viable animal model in sensory systems neuroscience because of the access to the array of genetic tools and in vivo electrophysiology and imaging techniques. As such the paradigms developed here provide a useful preparation to complement the existing well-established primate models

    Interactive Imitation Learning in Robotics: A Survey

    Full text link
    Interactive Imitation Learning (IIL) is a branch of Imitation Learning (IL) where human feedback is provided intermittently during robot execution allowing an online improvement of the robot's behavior. In recent years, IIL has increasingly started to carve out its own space as a promising data-driven alternative for solving complex robotic tasks. The advantages of IIL are its data-efficient, as the human feedback guides the robot directly towards an improved behavior, and its robustness, as the distribution mismatch between the teacher and learner trajectories is minimized by providing feedback directly over the learner's trajectories. Nevertheless, despite the opportunities that IIL presents, its terminology, structure, and applicability are not clear nor unified in the literature, slowing down its development and, therefore, the research of innovative formulations and discoveries. In this article, we attempt to facilitate research in IIL and lower entry barriers for new practitioners by providing a survey of the field that unifies and structures it. In addition, we aim to raise awareness of its potential, what has been accomplished and what are still open research questions. We organize the most relevant works in IIL in terms of human-robot interaction (i.e., types of feedback), interfaces (i.e., means of providing feedback), learning (i.e., models learned from feedback and function approximators), user experience (i.e., human perception about the learning process), applications, and benchmarks. Furthermore, we analyze similarities and differences between IIL and RL, providing a discussion on how the concepts offline, online, off-policy and on-policy learning should be transferred to IIL from the RL literature. We particularly focus on robotic applications in the real world and discuss their implications, limitations, and promising future areas of research
    • …
    corecore