Search CORE

1,358 research outputs found

Self-Modification of Policy and Utility Function in Rational Agents

Author: B Hibbard
D Dewey
D Silver
J Schmidhuber
L Orseau
L Orseau
L Orseau
LP Kaelbling
M Hutter
M Hutter
M Ring
N Bostrom
R Sutton
RV Yampolskiy
S Legg
V Mnih
Publication venue
Publication date: 10/05/2016
Field of study

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Active MR k-space Sampling with Reinforcement Learning

Author: B Gözcü
B Zhu
F Chen
GP Zientara
GP Zientara
J Schlemper
K Lønning
LP Kaelbling
LP Panych
M Lustig
M Seeger
ML Puterman
P Zhang
RS Sutton
RS Sutton
S Wang
S-S Yoo
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2020
Field of study

Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition. The majority of existing work have focused on designing better reconstruction models given a pre-determined acquisition trajectory, ignoring the question of trajectory optimization. In this paper, we focus on learning acquisition trajectories given a fixed image reconstruction model. We formulate the problem as a sequential decision process and propose the use of reinforcement learning to solve it. Experiments on a large scale public MRI dataset of knees show that our proposed models significantly outperform the state-of-the-art in active MRI acquisition, over a large range of acceleration factors.Comment: Presented at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 202

arXiv.org e-Print Archive

Crossref

Lilia, A Showcase for Fast Bootstrap of Conversation-Like Dialogues Based on a Goal-Oriented System

Author: B Thomson
E Ferreira
J Sidnell
L Daubigney
LP Kaelbling
RS Sutton
S Al Moubayed
S Young
T Chaminade
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International audienceRecently many works have proposed to cast human-machine interaction in a sentence generation scheme. Neural networks models can learn how to generate a probable sentence based on the user's statement along with a partial view of the dialogue history. While appealing to some extent, these approaches require huge training sets of general-purpose data and lack a principled way to intertwine language generation with information retrieval from back-end resources to fuel the dialogue with actualised and precise knowledge. As a practical alternative, in this paper, we present Lilia, a showcase for fast bootstrap of conversation-like dialogues based on a goal-oriented system. First, a comparison of goal-oriented and conversational system features is led, then a conversion process is described for the fast bootstrap of a new system, finalised with an on-line training of the system's main components. Lilia is dedicated to a chitchat task, where speakers exchange viewpoints on a displayed image while trying collaboratively to derive its author's intention. Evaluations with user trials showed its efficiency in a realistic setup

Crossref

Bayesian optimization for materials design

Author: A Booker
A Forrester
AB Gelman
AIJ Forrester
B Ankenman
BE Stuckman
CE Rasmussen
D Huang
D Huang
David Ginsbourger
Diana M. Negoescu
DR Jones
HJ Kushner
J Bect
J Knowles
J Mockus
J Villemonteix
J Xie
LP Kaelbling
Noel Cressie
PI Frazier
PI Frazier
PI Frazier
R Waeber
RA Howard
RS Sutton
Sethuraman Sankaran
TJ Santner
W Scott
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/06/2015
Field of study

We introduce Bayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian process regression, which allows predicting the performance of a new design based on previously tested designs. After providing a detailed introduction to Gaussian process regression, we introduce two Bayesian optimization methods: expected improvement, for design problems with noise-free evaluations; and the knowledge-gradient method, which generalizes expected improvement and may be used in design problems with noisy evaluations. Both methods are derived using a value-of-information analysis, and enjoy one-step Bayes-optimality

arXiv.org e-Print Archive

Crossref

Deep Reinforcement Learning: An Overview

Author: AG Barto
D Ormoneit
F Sehnke
G Tesauro
H-G Beyer
J Kober
J Schmidhuber
LP Kaelbling
MG Bellemare
P Vincent
RS Sutton
S Hochreiter
SS Mousavi
V Mnih
W Böhmer
Y Bengio
Y Bengio
Y Bengio
Y Lecun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2018
Field of study

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

arXiv.org e-Print Archive

Crossref

Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

Author: A Moore
B Tanner
D Floreano
D Floreano
D Goldberg
D Pratihar
DE Moriarty
G Harik
G Tesauro
J Gauci
J Hurst
K.O Stanley
KO Stanley
L Cardamone
LP Kaelbling
LP Kaelbling
M Schmidt
N Hansen
O Maron
O Mihatsch
O Sigaud
P Abbeel
P De Boer
P Geibel
P Nordin
P Ponterosso
P Schroder
P Stagge
P Stone
R Brafman
R Regis
RE Bellman
RE Bellman
Rogier Koppejan
RS Sutton
RS Sutton
S Chen
S Whiteson
S Whiteson
S Whiteson
S Whiteson
S Wilson
Shimon Whiteson
X Yao
Y Jin
Y Jin
Y Ong
Publication venue: Springer-Verlag
Publication date: 01/01/2011
Field of study

This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

Author: A Arleo
A Barto
A Saudargiene
AH Klopf
AH Klopf
B Porr
B Porr
B Porr
Bernd Porr
Christoph Kolodziejski
CJCH Watkins
CL Hull
CL Hull
DJ Foster
F Wörgötter
Florentin Wörgötter
GQ Bi
H Markram
IH Witten
JC Magee
JD Miller
JL Krichmar
JP Pfister
LP Kaelbling
M Tsukamoto
P Dayan
P Dayan
P Dayan
P Manoonpong
P Roberts
PR Montague
PR Montague
R Sutton
RE Suri
RE Suri
RE Suri
RE Suri
RE Suri
RS Sutton
RS Sutton
RS Sutton
RV Florian
SP Singh
T Kulvicius
T Strösslin
TB Boykina
W Gerstner
W Schultz
W Schultz
Y Humeau
Publication venue: Springer-Verlag
Publication date: 01/01/2008
Field of study

A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

Feasibility of ‘parkrun’ for people with knee osteoarthritis: A mixed methods pilot study

Author: Aitken D
Antony B
Balogun S
Cleland V
Grunseit A
Jones G
Jose K
Lahham A
Moore MN
Sutton LP
Winzenberg T
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

ObjectiveDesignThis uncontrolled mixed methods pilot study enrolled people with knee OA not meeting physical activity guidelines. Participants were asked to walk in four consecutive parkrun events supervised by an exercise physiologist/physiotherapist. Feasibility was assessed by recruitment data (numbers screened and time to enrol 15 participants), adherence to the protocol, acceptability (measured by confidence, enjoyment, difficulty ratings and qualitative interviews), and safety (adverse events). Secondary measures were changes in knee pain, function, stiffness, and physical activity levels.ResultsParticipants (n = 17) were enrolled over 11 months and recruitment was slower than anticipated. Fourteen participants attended all four parkruns and three of these participants shortened the 5 km course to ∼3 km. Across all four parkruns, 75% of participants reported high confidence that they could complete the upcoming parkrun and the majority (87%) enjoyed participating. Most participants rated parkrun either slightly difficult (38.5%) or moderately difficult (35%) and two mild adverse events were reported. Participants showed improvements in knee pain, function, stiffness, and physical activity levels.ConclusionsThis pilot study demonstrates parkrun's feasibility, acceptability, safety and, its potential to improve knee OA symptoms and physical activity levels. Participating in parkrun was acceptable and enjoyable for some, but not all participants. The scalability, accessibility and wide appeal of parkrun supports the development of larger programs of research to evaluate the use of parkrun for people with knee OA

OPUS - University of Technology Sydney

PubMed Central

University of Tasmania Open Access Repository

Uridine-derived ribose fuels glucose-restricted pancreatic cancer.

Author: Andren A
Apiz-Saab JJ
Brown K
Carpenter ES
Dzierozynski LN
Halbrook CJ
He X
Kasperek S
Lee H-J
Lyssiotis CA
Menjivar RE
Muir A
Nwosu ZC
Nyamundanda G
Pasca di Magliano M
Patti GJ
Poudel P
Ps H
Radyk M
Ragulan C
Sadanandam A
Sajjakulnukit P
Shi J
Shriver LP
Sutton D
Tolstyka Z
Ugras J
Ward MH
Zhang L
Publication venue: NATURE PORTFOLIO
Publication date: 15/08/2023
Field of study

Pancreatic ductal adenocarcinoma (PDA) is a lethal disease notoriously resistant to therapy1,2. This is mediated in part by a complex tumour microenvironment3, low vascularity4, and metabolic aberrations5,6. Although altered metabolism drives tumour progression, the spectrum of metabolites used as nutrients by PDA remains largely unknown. Here we identified uridine as a fuel for PDA in glucose-deprived conditions by assessing how more than 175 metabolites impacted metabolic activity in 21 pancreatic cell lines under nutrient restriction. Uridine utilization strongly correlated with the expression of uridine phosphorylase 1 (UPP1), which we demonstrate liberates uridine-derived ribose to fuel central carbon metabolism and thereby support redox balance, survival and proliferation in glucose-restricted PDA cells. In PDA, UPP1 is regulated by KRAS-MAPK signalling and is augmented by nutrient restriction. Consistently, tumours expressed high UPP1 compared with non-tumoural tissues, and UPP1 expression correlated with poor survival in cohorts of patients with PDA. Uridine is available in the tumour microenvironment, and we demonstrated that uridine-derived ribose is actively catabolized in tumours. Finally, UPP1 deletion restricted the ability of PDA cells to use uridine and blunted tumour growth in immunocompetent mouse models. Our data identify uridine utilization as an important compensatory metabolic process in nutrient-deprived PDA cells, suggesting a novel metabolic axis for PDA therapy

Institute of Cancer Research Repository