2,425 research outputs found

    Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

    Get PDF
    Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks. Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply; this is because agents, even in cooperative games, could have conflicting directions of policy updates. As a result, achieving a guaranteed improvement on the joint policy where each agent acts individually remains an open challenge. In this paper, we extend the theory of trust region learning to cooperative MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme. Based on these, we develop Heterogeneous-Agent Trust Region Policy Optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy Optimisation (HAPPO) algorithms. Unlike many existing MARL algorithms, HATRPO/HAPPO do not need agents to share parameters, nor do they need any restrictive assumptions on decomposibility of the joint value function. Most importantly, we justify in theory the monotonic improvement property of HATRPO/HAPPO. We evaluate the proposed methods on a series of Multi-Agent MuJoCo and StarCraftII tasks. Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, thereby establishing a new state of the art

    Settling the Variance of Multi-Agent Policy Gradients

    Get PDF
    Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents’ explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin. Code is released at https://github.com/morning9393/Optimal-Baseline-for-Multi-agent-Policy-Gradients

    The evaluation of a Taiwanese training program in smoking cessation and the trainees' adherence to a practice guideline

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Taiwanese government began reimbursement for smoking cessation in 2002. Certification from a training program was required for physicians who wanted reimbursement. The program certified 6,009 physicians till 2007. The objective of this study is to evaluate the short- and long term efficacy of the training program.</p> <p>Methods</p> <p>For short term evaluation, all trainees in 2007 were recruited. For long term evaluation, computer randomly selected 2,000 trainees who received training from 2002 to 2006 were recruited. Course satisfaction, knowledge, confidence in providing smoking cessation services and the adherence to a practice guideline were evaluated by questionnaires.</p> <p>Results</p> <p>Trainees reported high satisfaction with the training program. There was significant difference between pre- and post-test scores in knowledge. Confidence in providing services was lower in the long term evaluation compared to short term evaluation. For adherence to a practice guideline, 86% asked the status of smoking, 88% advised the smokers to quit, 76% assessed the smoker's willingness to quit, 59% assisted the smokers to quit, and 60% arranged follow-up visits for smokers. The incentive of reimbursement was the most significant factor affecting confidence and adherence.</p> <p>Conclusions</p> <p>The training program was satisfactory and effective. Adherence to a practice guideline in our study was better than studies without physician training in other countries.</p

    8-hydroxy-2'-deoxyguanosine, a major mutagenic oxidative DNA lesion, and DNA strand breaks in nasal respiratory epithelium of children exposed to urban pollution.

    Get PDF
    Southwest metropolitan Mexico City children are repeatedly exposed to high levels of a complex mixture of air pollutants, including ozone, particulate matter, aldehydes, metals, and nitrogen oxides. We explored nasal cell 8-hydroxy-2'-deoxyguanosine (8-OHdG), a major mutagenic lesion producing G-->T transversion mutations, using an immunohistochemical method, and DNA single strand breaks (ssb) using the single cell gel electrophoresis assay as biomarkers of oxidant exposure. Nasal biopsies from the posterior inferior turbinate were examined in children in grades one through five, including 12 controls from a low-polluted coastal town and 87 Mexico City children. Each biopsy was divided for the 8-OHdG and DNA ssb assays. There was an age-dependent increase in the percentage of nasal cells with DNA tails > 10 microm in Mexico City children: 19 +/- 9% for control cells, and 43 +/- 4, 50 +/- 16, 56 +/- 17, 60 +/- 17 and 73 +/- 14%, respectively, for first through fifth graders (p < 0.05). Nasal ssb were significantly higher in fifth graders than in first graders (p < 0.05). Higher levels (2.3- to 3-fold) of specific nuclear staining for 8-OHdG were observed in exposed children as compared to controls (p < 0.05). These results suggest that DNA damage is present in nasal epithelial cells in Mexico City children. Persistent oxidative DNA damage may ultimately result in a selective growth of pr eneoplastic nasal initiated cells in this population and the potential for nasal neoplasms may increase with age. The combination of 8-OHdG and DNA ssb should be useful for monitoring oxidative damage in people exposed to polluted atmospheres

    Appropriate criteria for identification of near-miss maternal morbidity in tertiary care facilities: A cross sectional study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study of severe maternal morbidity survivors (near miss) may be an alternative or a complement to the study of maternal death events as a health care indicator. However, there is still controversy regarding the criteria for identification of near-miss maternal morbidity. This study aimed to characterize the near miss maternal morbidity according to different sets of criteria.</p> <p>Methods</p> <p>A descriptive study in a tertiary center including 2,929 women who delivered there between July 2003 and June 2004. Possible cases of near miss were daily screened by checking different sets of criteria proposed elsewhere. The main outcome measures were: rate of near miss and its primary determinant factors, criteria for its identification, total hospital stay, ICU stay, and number and kind of special procedures performed.</p> <p>Results</p> <p>There were two maternal deaths and 124 cases of near miss were identified, with 102 of them admitted to the ICU (80.9%). Among the 126 special procedures performed, the most frequent were central venous access, echocardiography and invasive mechanical ventilation. The mean hospital stay was 10.3 (± 13.24) days. Hospital stay and the number of special procedures performed were significantly higher when the organ dysfunction based criteria were applied.</p> <p>Conclusion</p> <p>The adoption of a two level screening strategy may lead to the development of a consistent severe maternal morbidity surveillance system but further research is needed before worldwide near miss criteria can be assumed.</p

    Long-term medical utilization following ventilator-associated pneumonia in acute stroke and traumatic brain injury patients: a case-control study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The economic burden of ventilator-associated pneumonia (VAP) during the index hospitalization has been confirmed in previous studies. However, the long-term economic impact is still unclear. The aim of this study is to examine the effect of VAP on medical utilization in the long term.</p> <p>Methods</p> <p>This is a retrospective case-control study. Study subjects were patients experiencing their first traumatic brain injury, acute hemorrhagic stroke, or acute ischemic stroke during 2004. All subjects underwent endotracheal intubation in the emergency room (ER) on the day of admission or the day before admission, were transferred to the intensive care unit (ICU) and were mechanically ventilated for 48 hours or more. A total of 943 patients who developed VAP were included as the case group, and each was matched with two control patients without VAP by age ( ± 2 years), gender, diagnosis, date of admission ( ± 1 month) and hospital size, resulting in a total of 2,802 patients in the study. Using robust regression and Poisson regression models we examined the effect of VAP on medical utilization including hospitalization expenses, outpatient expenses, total medical expenses, number of ER visits, number of readmissions, number of hospitalization days and number of ICU days, during the index hospitalization and during the following 2-year period.</p> <p>Results</p> <p>Patients in the VAP group had higher hospitalization expenses, longer length of stay in hospital and in ICU, and a greater number of readmissions than the control group patients.</p> <p>Conclusions</p> <p>VAP has a significant impact on medical expenses and utilization, both during the index hospitalization during which VAP developed and in the longer term.</p
    corecore