Search CORE

2,262 research outputs found

Deep Ordinal Reinforcement Learning

Author: C Wirth
CJ Watkins
RS Sutton
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2019
Field of study

Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.Comment: replaced figures for better visibility, added github repository, more details about source of experimental results, updated target value calculation for standard and ordinal Deep Q-Networ

arXiv.org e-Print Archive

Crossref

Learning from Monte Carlo Rollouts with Opponent Models for Playing Tron

Author: AL Samuel
CJ Watkins
D Silver
D Silver
G Tesauro
J Baxter
J Schmidhuber
L Kocsis
M Otterlo van
RS Sutton
RS Sutton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2018
Field of study

This paper describes a novel reinforcement learning system for learning to play the game of Tron. The system combines Q-learning, multi-layer perceptrons, vision grids, opponent modelling, and Monte Carlo rollouts in a novel way. By learning an opponent model, Monte Carlo rollouts can be effectively applied to generate state trajectories for all possible actions from which improved action estimates can be computed. This allows to extend experience replay by making it possible to update the state-action values of all actions in a given game state simultaneously. The results show that the use of experience replay that updates the Q-values of all actions simultaneously strongly outperforms the conventional experience replay that only updates the Q-value of the performed action. The results also show that using short or long rollout horizons during training lead to similar good performances against two fixed opponents

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Pseudorehearsal in value function approximation

Author: A Robins
A Robins
B Baddeley
CJ Watkins
J Gama
JL McClelland
JN Tsitsiklis
KP Murphy
M Frean
M Hattori
M McCloskey
R Coop
R Ratcliff
RJ Williams
RM French
RS Sutton
S Adam
Publication venue
Publication date: 21/03/2017
Field of study

Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters

arXiv.org e-Print Archive

Crossref

Adherence and persistence to direct oral anticoagulants in atrial fibrillation: a population-based study

Author: Antoniou S
Banerjee A
Benedetto V
Burnell J
Gichuru P
Marshall T
Ryan R
Schilling RJ
Strain WD
Sutton CJ
Watkins C
Publication venue
Publication date: 24/12/2019
Field of study

Background Despite simpler regimens than vitamin K antagonists (VKAs) for stroke prevention in atrial fibrillation (AF), adherence (taking drugs as prescribed) and persistence (continuation of drugs) to direct oral anticoagulants are suboptimal, yet understudied in electronic health records (EHRs). Objective We investigated (1) time trends at individual and system levels, and (2) the risk factors for and associations between adherence and persistence. Methods In UK primary care EHR (The Health Information Network 2011–2016), we investigated adherence and persistence at 1 year for oral anticoagulants (OACs) in adults with incident AF. Baseline characteristics were analysed by OAC and adherence/persistence status. Risk factors for non-adherence and non-persistence were assessed using Cox and logistic regression. Patterns of adherence and persistence were analysed. Results Among 36 652 individuals with incident AF, cardiovascular comorbidities (median CHA2DS2VASc[Congestive heart failure, Hypertension, Age≥75 years, Diabetes mellitus, Stroke, Vascular disease, Age 65-74 years, Sex category] 3) and polypharmacy (median number of drugs 6) were common. Adherence was 55.2% (95% CI 54.6 to 55.7), 51.2% (95% CI 50.6 to 51.8), 66.5% (95% CI 63.7 to 69.2), 63.1% (95% CI 61.8 to 64.4) and 64.7% (95% CI 63.2 to 66.1) for all OACs, VKA, dabigatran, rivaroxaban and apixaban. One-year persistence was 65.9% (95% CI 65.4 to 66.5), 63.4% (95% CI 62.8 to 64.0), 61.4% (95% CI 58.3 to 64.2), 72.3% (95% CI 70.9 to 73.7) and 78.7% (95% CI 77.1 to 80.1) for all OACs, VKA, dabigatran, rivaroxaban and apixaban. Risk of non-adherence and non-persistence increased over time at individual and system levels. Increasing comorbidity was associated with reduced risk of non-adherence and non-persistence across all OACs. Overall rates of ‘primary non-adherence’ (stopping after first prescription), ‘non-adherent non-persistence’ and ‘persistent adherence’ were 3.5%, 26.5% and 40.2%, differing across OACs. Conclusions Adherence and persistence to OACs are low at 1 year with heterogeneity across drugs and over time at individual and system levels. Better understanding of contributory factors will inform interventions to improve adherence and persistence across OACs in individuals and populations

UCL Discovery

Metabolic state alters economic decision making under risk in humans

Background: Animals' attitudes to risk are profoundly influenced by metabolic state (hunger and baseline energy stores). Specifically, animals often express a preference for risky (more variable) food sources when below a metabolic reference point (hungry), and safe (less variable) food sources when sated. Circulating hormones report the status of energy reserves and acute nutrient intake to widespread targets in the central nervous system that regulate feeding behaviour, including brain regions strongly implicated in risk and reward based decision-making in humans. Despite this, physiological influences per se have not been considered previously to influence economic decisions in humans. We hypothesised that baseline metabolic reserves and alterations in metabolic state would systematically modulate decision-making and financial risk-taking in humans. Methodology/Principal Findings: We used a controlled feeding manipulation and assayed decision-making preferences across different metabolic states following a meal. To elicit risk-preference, we presented a sequence of 200 paired lotteries, subjects' task being to select their preferred option from each pair. We also measured prandial suppression of circulating acyl-ghrelin (a centrally-acting orexigenic hormone signalling acute nutrient intake), and circulating leptin levels (providing an assay of energy reserves). We show both immediate and delayed effects on risky decision-making following a meal, and that these changes correlate with an individual's baseline leptin and changes in acyl-ghrelin levels respectively. Conclusions/Significance: We show that human risk preferences are exquisitely sensitive to current metabolic state, in a direction consistent with ecological models of feeding behaviour but not predicted by normative economic theory. These substantive effects of state changes on economic decisions perhaps reflect shared evolutionarily conserved neurobiological mechanisms. We suggest that this sensitivity in human risk-preference to current metabolic state has significant implications for both real-world economic transactions and for aberrant decision-making in eating disorders and obesity

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Oxford University Research Archive

MPG.PuRe

Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

Author: A Arleo
B Espiau
C Breazeal
CJ Watkins
DJ Foster
F Chaumette
F Chaumette
F Wörgötter
Florentin Wörgötter
G Tesauro
GJ Gordon
J Peters
J Soechting
J Soechting
M Moussa
M Moussa
M Tamosiunaite
Minija Tamosiunaite
R Dillmann
R Horaud
R Sutton
RJ Williams
RS Sutton
RS Sutton
T Strösslin
Tamim Asfour
V Ruis de Angulo
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult

Crossref

Springer - Publisher Connector

Vytautas Magnus University Institutional Repository (VMU ePub)

PubMed Central

Catastrophizing mediates the relationship between the personal belief in a just world and pain outcomes among chronic pain support group attendees

Author: A Furnham
A Furnham
A Hood
AK Rosenstiel
AM Elliot
C Dalbert
C Dalbert
C Dalbert
C Dalbert
C Dalbert
C Dalbert
C Knussen
C Ramirez-Maestre
Christina Knussen
CJ Main
CL Hafer
CL Park
DC Turk
DP Goldberg
G Nudelman
GA Brenes
GL Stonerock Jr
H Alves
HC Lench
HJA Samwel
IM Lipkus
J Greenberg
J McParland
J McParland
JL McParland
JL McParland
JL McParland
Joanna L. McParland
K Pulvers
L Begue
M Agrawal
M Corey
M Korff Von
M Roland
ME Robinson
MJ Lerner
MJ Muller
MJ Sullivan
MJL Sullivan
MJL Sullivan
MJL Sullivan
MO Roland
PJ Quartana
R Bulman
R Severeijns
RM Sutton
RM Sutton
RR Edwards
RR Edwards
RR Edwards
RS Lazarus
S Folkman
U Wernecke
W Scott
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Health-related research suggests the belief in a just world can act as a personal resource that protects against the adverse effects of pain and illness. However, currently, little is known about how this belief, particularly in relation to one’s own life, might influence pain. Consistent with the suggestions of previous research, the present study undertook a secondary data analysis to investigate pain catastrophizing as a mediator of the relationship between the personal just world belief and chronic pain outcomes in a sample of chronic pain support group attendees. Partially supporting the hypotheses, catastrophizing was negatively correlated with the personal just world belief and mediated the relationship between this belief and pain and disability, but not distress. Suggestions for future research and intervention development are made

Crossref

Springer - Publisher Connector

PubMed Central

ResearchOnline@GCU

Job retention vocational rehabilitation for employed people with inflammatory arthritis: adaptations to the Workwell trial due to the impact of the COVID-19 pandemic

Author: Ching A
Cotterill S
Culley J
Forshaw D
Haig A
Hammond A
Parker J
Sutton CJ
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2022
Field of study

There are high levels of work disability, absenteeism (sick leave) and presenteeism (reduced productivity) amongst people with inflammatory arthritis. Workwell is a multi-centre, randomised controlled trial of job retention vocational rehabilitation for employed people with inflammatory arthritis. The trial tested the effectiveness and cost-effectiveness of the Workwell programme compared to receipt of written self-help information only. Both arms continued to receive usual care. In March 2020, due to the COVID-19 pandemic, the Workwell trial paused to recruitment and intervention delivery. To successfully re-start, protocol amendments were rapidly submitted and changes to existing trial procedures made. The Workwell protocol was adapted in response to both the practical issues likely faced by many clinical research studies active across NHS sites during the pandemic, but also additional trial-specific challenges. A key eligibility criterion for the trial required participants to be in paid work for at least 15 hours per week. However, UK national lockdowns led to a substantial proportion of the workforce suddenly being furloughed or unable to work, and many people with arthritis taking immunosuppressive medications were asked to shield. Thus, the number of eligible participants reduced. Those continuing to work were harder to: identify as hospital clinics moved to remote delivery; screen, consent and then treat as hospital research staff and clinical therapists were re-deployed. New recruitment and consent strategies were applied and, where sites had reduced capacity, responsibilities were absorbed by the trial management team. Remote intervention delivery and electronic data capture were also implemented. By rapidly adapting the Workwell protocol and procedures, the trial successfully reopened to recruitment in July 2020, only four months after trial pause. We were able to achieve recruitment figures above the pre-COVID target and maintain a high retention rate. In addition, we found many of the protocol changes beneficial, as these streamlined trial procedures, thus improving efficiency. It is likely that many strategies implemented in response to the pandemic may become standard practice in future research within trials of a similar design and methodology

University of Salford Institutional Repository

Behavioral determinants as predictors of return to work after long-term sickness absence: an application of the theory of planned behavior

Background The aim of this prospective, longitudinal cohort study was to analyze the association between the three behavioral determinants of the theory of planned behavior (TPB) model-attitude, subjective norm and self-efficacy-and the time to return-to-work (RTW) in employees on long-term sick leave. Methods The study was based on a sample of 926 employees on sickness absence (maximum duration of 12 weeks). The employees filled out a baseline questionnaire and were subsequently followed until the tenth month after listing sick. The TPB-determinants were measured at baseline. Work attitude was measured with a Dutch language version of the Work Involvement Scale. Subjective norm was measured with a self-structured scale reflecting a person's perception of social support and social pressure. Self-efficacy was measured with the three subscales of a standardised Dutch version of the general self-efficacy scale (ALCOS): willingness to expend effort in completing the behavior, persistence in the face of adversity, and willingness to initiate behavior. Cox proportional hazards regression analyses were used to identify behavioral determinants of the time to RTW. Results Median time to RTW was 160 days. In the univariate analysis, all potential prognostic factors were significantly associated (P < 0.15) with time to RTW: work attitude, social support, and the three subscales of self-efficacy. The final multivariate model with time to RTW as the predicted outcome included work attitude, social support and willingness to expend effort in completing the behavior as significant predictive factors. Conclusions This prospective, longitudinal cohort-study showed that work attitude, social support and willingness to expend effort in completing the behavior are significantly associated with a shorter time to RTW in employees on long-term sickness absence. This provides suggestive evidence for the relevance of behavioral characteristics in the prediction of duration of sickness absence. It may be a promising approach to address the behavioral determinants in the development of interventions focusing on RTW in employees on long-term sick leave

CiteSeerX

Crossref

Proceedings - University of Groningen

University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

University of Groningen Digital Archive

Dissertations of the University of Groningen