34 research outputs found

    Answer Set Programming for Non-Stationary Markov Decision Processes

    Full text link
    Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains

    Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries

    Get PDF
    Abstract Background Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres. Methods This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries. Results In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia. Conclusion This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries

    Heuristic Selection of Actions in Multiagent Reinforcement Learning ∗

    No full text
    This work presents a new algorithm, called Heuristically Accelerated Minimax-Q (HAMMQ), that allows the use of heuristics to speed up the wellknown Multiagent Reinforcement Learning algorithm Minimax-Q. A heuristic function H that influences the choice of the actions characterises the HAMMQ algorithm. This function is associated with a preference policy that indicates that a certain action must be taken instead of another. A set of empirical evaluations were conducted for the proposed algorithm in a simplified simulator for the robot soccer domain, and experimental results show that even very simple heuristics enhances significantly the performance of the multiagent reinforcement learning algorithm.
    corecore