2,262 research outputs found

    Deep Ordinal Reinforcement Learning

    Full text link
    Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.Comment: replaced figures for better visibility, added github repository, more details about source of experimental results, updated target value calculation for standard and ordinal Deep Q-Networ

    Learning from Monte Carlo Rollouts with Opponent Models for Playing Tron

    Get PDF
    This paper describes a novel reinforcement learning system for learning to play the game of Tron. The system combines Q-learning, multi-layer perceptrons, vision grids, opponent modelling, and Monte Carlo rollouts in a novel way. By learning an opponent model, Monte Carlo rollouts can be effectively applied to generate state trajectories for all possible actions from which improved action estimates can be computed. This allows to extend experience replay by making it possible to update the state-action values of all actions in a given game state simultaneously. The results show that the use of experience replay that updates the Q-values of all actions simultaneously strongly outperforms the conventional experience replay that only updates the Q-value of the performed action. The results also show that using short or long rollout horizons during training lead to similar good performances against two fixed opponents

    Pseudorehearsal in value function approximation

    Full text link
    Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters

    Adherence and persistence to direct oral anticoagulants in atrial fibrillation: a population-based study

    Get PDF
    Background Despite simpler regimens than vitamin K antagonists (VKAs) for stroke prevention in atrial fibrillation (AF), adherence (taking drugs as prescribed) and persistence (continuation of drugs) to direct oral anticoagulants are suboptimal, yet understudied in electronic health records (EHRs). Objective We investigated (1) time trends at individual and system levels, and (2) the risk factors for and associations between adherence and persistence. Methods In UK primary care EHR (The Health Information Network 2011–2016), we investigated adherence and persistence at 1 year for oral anticoagulants (OACs) in adults with incident AF. Baseline characteristics were analysed by OAC and adherence/persistence status. Risk factors for non-adherence and non-persistence were assessed using Cox and logistic regression. Patterns of adherence and persistence were analysed. Results Among 36 652 individuals with incident AF, cardiovascular comorbidities (median CHA2DS2VASc[Congestive heart failure, Hypertension, Age≥75 years, Diabetes mellitus, Stroke, Vascular disease, Age 65-74 years, Sex category] 3) and polypharmacy (median number of drugs 6) were common. Adherence was 55.2% (95% CI 54.6 to 55.7), 51.2% (95% CI 50.6 to 51.8), 66.5% (95% CI 63.7 to 69.2), 63.1% (95% CI 61.8 to 64.4) and 64.7% (95% CI 63.2 to 66.1) for all OACs, VKA, dabigatran, rivaroxaban and apixaban. One-year persistence was 65.9% (95% CI 65.4 to 66.5), 63.4% (95% CI 62.8 to 64.0), 61.4% (95% CI 58.3 to 64.2), 72.3% (95% CI 70.9 to 73.7) and 78.7% (95% CI 77.1 to 80.1) for all OACs, VKA, dabigatran, rivaroxaban and apixaban. Risk of non-adherence and non-persistence increased over time at individual and system levels. Increasing comorbidity was associated with reduced risk of non-adherence and non-persistence across all OACs. Overall rates of ‘primary non-adherence’ (stopping after first prescription), ‘non-adherent non-persistence’ and ‘persistent adherence’ were 3.5%, 26.5% and 40.2%, differing across OACs. Conclusions Adherence and persistence to OACs are low at 1 year with heterogeneity across drugs and over time at individual and system levels. Better understanding of contributory factors will inform interventions to improve adherence and persistence across OACs in individuals and populations

    Metabolic state alters economic decision making under risk in humans

    Get PDF
    Background: Animals' attitudes to risk are profoundly influenced by metabolic state (hunger and baseline energy stores). Specifically, animals often express a preference for risky (more variable) food sources when below a metabolic reference point (hungry), and safe (less variable) food sources when sated. Circulating hormones report the status of energy reserves and acute nutrient intake to widespread targets in the central nervous system that regulate feeding behaviour, including brain regions strongly implicated in risk and reward based decision-making in humans. Despite this, physiological influences per se have not been considered previously to influence economic decisions in humans. We hypothesised that baseline metabolic reserves and alterations in metabolic state would systematically modulate decision-making and financial risk-taking in humans. Methodology/Principal Findings: We used a controlled feeding manipulation and assayed decision-making preferences across different metabolic states following a meal. To elicit risk-preference, we presented a sequence of 200 paired lotteries, subjects' task being to select their preferred option from each pair. We also measured prandial suppression of circulating acyl-ghrelin (a centrally-acting orexigenic hormone signalling acute nutrient intake), and circulating leptin levels (providing an assay of energy reserves). We show both immediate and delayed effects on risky decision-making following a meal, and that these changes correlate with an individual's baseline leptin and changes in acyl-ghrelin levels respectively. Conclusions/Significance: We show that human risk preferences are exquisitely sensitive to current metabolic state, in a direction consistent with ecological models of feeding behaviour but not predicted by normative economic theory. These substantive effects of state changes on economic decisions perhaps reflect shared evolutionarily conserved neurobiological mechanisms. We suggest that this sensitivity in human risk-preference to current metabolic state has significant implications for both real-world economic transactions and for aberrant decision-making in eating disorders and obesity

    Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

    Get PDF
    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult

    Catastrophizing mediates the relationship between the personal belief in a just world and pain outcomes among chronic pain support group attendees

    Get PDF
    Health-related research suggests the belief in a just world can act as a personal resource that protects against the adverse effects of pain and illness. However, currently, little is known about how this belief, particularly in relation to one’s own life, might influence pain. Consistent with the suggestions of previous research, the present study undertook a secondary data analysis to investigate pain catastrophizing as a mediator of the relationship between the personal just world belief and chronic pain outcomes in a sample of chronic pain support group attendees. Partially supporting the hypotheses, catastrophizing was negatively correlated with the personal just world belief and mediated the relationship between this belief and pain and disability, but not distress. Suggestions for future research and intervention development are made

    Job retention vocational rehabilitation for employed people with inflammatory arthritis: adaptations to the Workwell trial due to the impact of the COVID-19 pandemic

    Get PDF
    There are high levels of work disability, absenteeism (sick leave) and presenteeism (reduced productivity) amongst people with inflammatory arthritis. Workwell is a multi-centre, randomised controlled trial of job retention vocational rehabilitation for employed people with inflammatory arthritis. The trial tested the effectiveness and cost-effectiveness of the Workwell programme compared to receipt of written self-help information only. Both arms continued to receive usual care. In March 2020, due to the COVID-19 pandemic, the Workwell trial paused to recruitment and intervention delivery. To successfully re-start, protocol amendments were rapidly submitted and changes to existing trial procedures made. The Workwell protocol was adapted in response to both the practical issues likely faced by many clinical research studies active across NHS sites during the pandemic, but also additional trial-specific challenges. A key eligibility criterion for the trial required participants to be in paid work for at least 15 hours per week. However, UK national lockdowns led to a substantial proportion of the workforce suddenly being furloughed or unable to work, and many people with arthritis taking immunosuppressive medications were asked to shield. Thus, the number of eligible participants reduced. Those continuing to work were harder to: identify as hospital clinics moved to remote delivery; screen, consent and then treat as hospital research staff and clinical therapists were re-deployed. New recruitment and consent strategies were applied and, where sites had reduced capacity, responsibilities were absorbed by the trial management team. Remote intervention delivery and electronic data capture were also implemented. By rapidly adapting the Workwell protocol and procedures, the trial successfully reopened to recruitment in July 2020, only four months after trial pause. We were able to achieve recruitment figures above the pre-COVID target and maintain a high retention rate. In addition, we found many of the protocol changes beneficial, as these streamlined trial procedures, thus improving efficiency. It is likely that many strategies implemented in response to the pandemic may become standard practice in future research within trials of a similar design and methodology

    Behavioral determinants as predictors of return to work after long-term sickness absence: an application of the theory of planned behavior

    Get PDF
    Background The aim of this prospective, longitudinal cohort study was to analyze the association between the three behavioral determinants of the theory of planned behavior (TPB) model-attitude, subjective norm and self-efficacy-and the time to return-to-work (RTW) in employees on long-term sick leave. Methods The study was based on a sample of 926 employees on sickness absence (maximum duration of 12 weeks). The employees filled out a baseline questionnaire and were subsequently followed until the tenth month after listing sick. The TPB-determinants were measured at baseline. Work attitude was measured with a Dutch language version of the Work Involvement Scale. Subjective norm was measured with a self-structured scale reflecting a person's perception of social support and social pressure. Self-efficacy was measured with the three subscales of a standardised Dutch version of the general self-efficacy scale (ALCOS): willingness to expend effort in completing the behavior, persistence in the face of adversity, and willingness to initiate behavior. Cox proportional hazards regression analyses were used to identify behavioral determinants of the time to RTW. Results Median time to RTW was 160 days. In the univariate analysis, all potential prognostic factors were significantly associated (P < 0.15) with time to RTW: work attitude, social support, and the three subscales of self-efficacy. The final multivariate model with time to RTW as the predicted outcome included work attitude, social support and willingness to expend effort in completing the behavior as significant predictive factors. Conclusions This prospective, longitudinal cohort-study showed that work attitude, social support and willingness to expend effort in completing the behavior are significantly associated with a shorter time to RTW in employees on long-term sickness absence. This provides suggestive evidence for the relevance of behavioral characteristics in the prediction of duration of sickness absence. It may be a promising approach to address the behavioral determinants in the development of interventions focusing on RTW in employees on long-term sick leave
    corecore