1,358 research outputs found

    Self-Modification of Policy and Utility Function in Rational Agents

    Full text link
    Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

    Active MR k-space Sampling with Reinforcement Learning

    Full text link
    Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition. The majority of existing work have focused on designing better reconstruction models given a pre-determined acquisition trajectory, ignoring the question of trajectory optimization. In this paper, we focus on learning acquisition trajectories given a fixed image reconstruction model. We formulate the problem as a sequential decision process and propose the use of reinforcement learning to solve it. Experiments on a large scale public MRI dataset of knees show that our proposed models significantly outperform the state-of-the-art in active MRI acquisition, over a large range of acceleration factors.Comment: Presented at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 202

    Lilia, A Showcase for Fast Bootstrap of Conversation-Like Dialogues Based on a Goal-Oriented System

    Get PDF
    International audienceRecently many works have proposed to cast human-machine interaction in a sentence generation scheme. Neural networks models can learn how to generate a probable sentence based on the user's statement along with a partial view of the dialogue history. While appealing to some extent, these approaches require huge training sets of general-purpose data and lack a principled way to intertwine language generation with information retrieval from back-end resources to fuel the dialogue with actualised and precise knowledge. As a practical alternative, in this paper, we present Lilia, a showcase for fast bootstrap of conversation-like dialogues based on a goal-oriented system. First, a comparison of goal-oriented and conversational system features is led, then a conversion process is described for the fast bootstrap of a new system, finalised with an on-line training of the system's main components. Lilia is dedicated to a chitchat task, where speakers exchange viewpoints on a displayed image while trying collaboratively to derive its author's intention. Evaluations with user trials showed its efficiency in a realistic setup

    Bayesian optimization for materials design

    Full text link
    We introduce Bayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian process regression, which allows predicting the performance of a new design based on previously tested designs. After providing a detailed introduction to Gaussian process regression, we introduce two Bayesian optimization methods: expected improvement, for design problems with noise-free evaluations; and the knowledge-gradient method, which generalizes expected improvement and may be used in design problems with noisy evaluations. Both methods are derived using a value-of-information analysis, and enjoy one-step Bayes-optimality

    Deep Reinforcement Learning: An Overview

    Full text link
    In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

    Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

    Get PDF
    This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks

    Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

    Get PDF
    A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications

    Feasibility of ‘parkrun’ for people with knee osteoarthritis: A mixed methods pilot study

    Get PDF
    ObjectiveDesignThis uncontrolled mixed methods pilot study enrolled people with knee OA not meeting physical activity guidelines. Participants were asked to walk in four consecutive parkrun events supervised by an exercise physiologist/physiotherapist. Feasibility was assessed by recruitment data (numbers screened and time to enrol 15 participants), adherence to the protocol, acceptability (measured by confidence, enjoyment, difficulty ratings and qualitative interviews), and safety (adverse events). Secondary measures were changes in knee pain, function, stiffness, and physical activity levels.ResultsParticipants (n ​= ​17) were enrolled over 11 months and recruitment was slower than anticipated. Fourteen participants attended all four parkruns and three of these participants shortened the 5 ​km course to ∼3 ​km. Across all four parkruns, 75% of participants reported high confidence that they could complete the upcoming parkrun and the majority (87%) enjoyed participating. Most participants rated parkrun either slightly difficult (38.5%) or moderately difficult (35%) and two mild adverse events were reported. Participants showed improvements in knee pain, function, stiffness, and physical activity levels.ConclusionsThis pilot study demonstrates parkrun's feasibility, acceptability, safety and, its potential to improve knee OA symptoms and physical activity levels. Participating in parkrun was acceptable and enjoyable for some, but not all participants. The scalability, accessibility and wide appeal of parkrun supports the development of larger programs of research to evaluate the use of parkrun for people with knee OA

    Uridine-derived ribose fuels glucose-restricted pancreatic cancer.

    Get PDF
    Pancreatic ductal adenocarcinoma (PDA) is a lethal disease notoriously resistant to therapy1,2. This is mediated in part by a complex tumour microenvironment3, low vascularity4, and metabolic aberrations5,6. Although altered metabolism drives tumour progression, the spectrum of metabolites used as nutrients by PDA remains largely unknown. Here we identified uridine as a fuel for PDA in glucose-deprived conditions by assessing how more than 175 metabolites impacted metabolic activity in 21 pancreatic cell lines under nutrient restriction. Uridine utilization strongly correlated with the expression of uridine phosphorylase 1 (UPP1), which we demonstrate liberates uridine-derived ribose to fuel central carbon metabolism and thereby support redox balance, survival and proliferation in glucose-restricted PDA cells. In PDA, UPP1 is regulated by KRAS-MAPK signalling and is augmented by nutrient restriction. Consistently, tumours expressed high UPP1 compared with non-tumoural tissues, and UPP1 expression correlated with poor survival in cohorts of patients with PDA. Uridine is available in the tumour microenvironment, and we demonstrated that uridine-derived ribose is actively catabolized in tumours. Finally, UPP1 deletion restricted the ability of PDA cells to use uridine and blunted tumour growth in immunocompetent mouse models. Our data identify uridine utilization as an important compensatory metabolic process in nutrient-deprived PDA cells, suggesting a novel metabolic axis for PDA therapy

    Methods for specifying the target difference in a randomised controlled trial : the Difference ELicitation in TriAls (DELTA) systematic review

    Get PDF
    Peer reviewedPublisher PD
    corecore