Search CORE

3,873,374 research outputs found

Avoiding Wireheading with Value Reinforcement Learning

Author: B Hibbard
CE Sezener
D Dewey
M Ring
Publication venue
Publication date: 10/05/2016
Field of study

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) is a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent's actions. The constraint is defined in terms of the agent's belief distributions, and does not require an explicit specification of which actions constitute wireheading.Comment: Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Rule Value Reinforcement Learning for Cognitive Agents

Author: Child C. H. T.
Stathis K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

RVRL (Rule Value Reinforcement Learning) is a new algorithm which extends an existing learning framework that models the environment of a situated agent using a probabilistic rule representation. The algorithm attaches values to learned rules by adapting reinforcement learning. Structure captured by the rules is used to form a policy. The resulting rule values represent the utility of taking an action if the rule`s conditions are present in the agent`s current percept. Advantages of the new framework are demonstrated, through examples in a predator-prey environment

City Research Online

Crossref

Do value-added estimates add value ? accounting for learning dynamics

Author: Andrabi Tahir
Das Jishnu
Khwaja Asim Ijaz
Zajonc Tristan
Publication venue
Publication date
Field of study

Evaluations of educational programs commonly assume that what children learn persists over time. The authors compare learning in Pakistani public and private schools using dynamic panel methods that account for three key empirical challenges to widely used value-added models: imperfect persistence, unobserved student heterogeneity, and measurement error. Their estimates suggest that only a fifth to a half of learning persists between grades and that private schools increase average achievement by 0.25 standard deviations each year. In contrast, estimates from commonly used value-added models significantly understate the impact of private schools’ on student achievement and/or overstate persistence. These results have implications for program evaluation and value-added accountability system design.Education For All,Tertiary Education,Secondary Education,Primary Education,Teaching and Learning

Research Papers in Economics

Learning Contextual Reward Expectations for Value Adaptation

Author: Chew B.
Dayan P.
Dolan R.J.
Rigoli F.
Publication venue: MIT Press - Journals
Publication date: 01/01/2018
Field of study

Substantial evidence indicates that subjective value is adapted to the statistics of reward expected within a given temporal context. However, how these contextual expectations are learned is poorly understood. To examine such learning, we exploited a recent observation that participants performing a gambling task adjust their preferences as a function of context. We show that, in the absence of contextual cues providing reward information, an average reward expectation was learned from recent past experience. Learning dependent on contextual cues emerged when two contexts alternated at a fast rate, whereas both cue-independent and cue-dependent forms of learning were apparent when two contexts alternated at a slower rate. Motivated by these behavioral findings, we reanalyzed a previous fMRI data set to probe the neural substrates of learning contextual reward expectations. We observed a form of reward prediction error related to average reward such that, at option presentation, activity in ventral tegmental area/substantia nigra and ventral striatum correlated positively and negatively, respectively, with the actual and predicted value of options. Moreover, an inverse correlation between activity in ventral tegmental area/substantia nigra (but not striatum) and predicted option value was greater in participants showing enhanced choice adaptation to context. The findings help understanding the mechanisms underlying learning of contextual reward expectation

Learning, Structural Instability and Present Value Calculations

Author: Pesaran M. Hashem
Pettenuzzo Davide
Timmermann Allan
Publication venue: Faculty of Economics
Publication date: 01/01/2006
Field of study

Present value calculations require predictions of cash flows both at near and distant future points in time. Such predictions are generally surrounded by considerable uncertainty and may critically depend on assumptions about parameter values as well as the form and stability of the data generating process underlying the cash flows. This paper presents new theoretical results for the existence of the infinite sum of discounted expected future values under uncertainty about the parameters characterizing the growth rate of the cash flow process. Furthermore, we explore the consequences for present values of relaxing the stability assumption in a way that allows for past and future breaks to the underlying cash flow process. We find that such breaks can lead to considerable changes in present values

Comparing policy gradient and value function based reinforcement learning methods in simulated electrical power trade

Author: Burt Graeme
Galloway Stuart
Lincoln Richard
Stephen Bruce
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2012
Field of study

In electrical power engineering, reinforcement learning algorithms can be used to model the strategies of electricity market participants. However, traditional value function based reinforcement learning algorithms suffer from convergence issues when used with value function approximators. Function approximation is required in this domain to capture the characteristics of the complex and continuous multivariate problem space. The contribution of this paper is the comparison of policy gradient reinforcement learning methods, using artificial neural networks for policy function approximation, with traditional value function based methods in simulations of electricity trade. The methods are compared using an AC optimal power flow based power exchange auction market model and a reference electric power system model

Crossref

University of Strathclyde Institutional Repository