495 research outputs found

    Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

    Get PDF
    Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks. Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply; this is because agents, even in cooperative games, could have conflicting directions of policy updates. As a result, achieving a guaranteed improvement on the joint policy where each agent acts individually remains an open challenge. In this paper, we extend the theory of trust region learning to cooperative MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme. Based on these, we develop Heterogeneous-Agent Trust Region Policy Optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy Optimisation (HAPPO) algorithms. Unlike many existing MARL algorithms, HATRPO/HAPPO do not need agents to share parameters, nor do they need any restrictive assumptions on decomposibility of the joint value function. Most importantly, we justify in theory the monotonic improvement property of HATRPO/HAPPO. We evaluate the proposed methods on a series of Multi-Agent MuJoCo and StarCraftII tasks. Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, thereby establishing a new state of the art

    Settling the Variance of Multi-Agent Policy Gradients

    Get PDF
    Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents’ explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin. Code is released at https://github.com/morning9393/Optimal-Baseline-for-Multi-agent-Policy-Gradients

    Psychologizing indexes of societal progress: Accounting for cultural diversity in preferred developmental pathways

    Get PDF
    Since the Second World War, the dominating paradigm of societal development has focused on economic growth. While economic growth has improved the quality of human life in a variety of ways, we posit that the identification of economic growth as the primary societal goal is culture-blind because preferences for developmental pathways likely vary between societies. We argue that the cultural diversity of developmental goals and the pathways leading to these goals could be reflected in a culturally sensitive approach to assessing societal development. For the vast majority of post-materialistic societies, it is an urgent necessity to prepare culturally sensitive compasses on how to develop next, and to start conceptualizing growth in a more nuanced and culturally responsive way. Furthermore, we propose that cultural sensitivity in measuring societal growth could also be applied to existing development indicators (e.g. the Human Development Index). We call for cultural researchers, in cooperation with development economists and other social scientists, to prepare a new cultural map of developmental goals, and to create and adapt development indexes that are more culturally sensitive. This innovation could ultimately help social planners understand the diverse pathways of development and assess the degree to which societies are progressing in a self-determined and indigenously valued manner.info:eu-repo/semantics/acceptedVersio

    Nurses\u27 Alumnae Association Bulletin - Volume 16 Number 1

    Get PDF
    Alumnae Notes ANA Biennial Convention Cancer of the Cervix, Uterus and Ovaries Committee Reports Digest of Alumnae Association Meetings Greetings from Miss Childs Greetings from the President Graduation Awards - 1950 Isotopes and the Nurse - Dr. T.P. Eberhard Marriages Necrology New Arrivals Nursing Care in Heart Disease with Pulmonary Infarction Nursing Care of a Mitral Commissurotomy Physical Advances at Jefferson - 1950 Policies of the Private Duty Nurses\u27 Registry Staff Activities, 1950-1951 Students\u27 Corner The Department of Surgical Research - Drs. Templeton and Gibbon White Haven and Barton Memorial Division

    Analysis of acoustic emission during the melting of embedded indium particles in an aluminum matrix: a study of plastic strain accommodation during phase transformation

    Full text link
    Acoustic emission is used here to study melting and solidification of embedded indium particles in the size range of 0.2 to 3 um in diameter and to show that dislocation generation occurs in the aluminum matrix to accommodate a 2.5% volume change. The volume averaged acoustic energy produced by indium particle melting is similar to that reported for bainite formation upon continuous cooling. A mechanism of prismatic loop generation is proposed to accommodate the volume change and an upper limit to the geometrically necessary increase in dislocation density is calculated as 4.1 x 10^9 cm^-2 for the Al-17In alloy. Thermomechanical processing is also used to change the size and distribution of the indium particles within the aluminum matrix. Dislocation generation with accompanied acoustic emission occurs when the melting indium particles are associated with grain boundaries or upon solidification where the solid-liquid interfaces act as free surfaces to facilitate dislocation generation. Acoustic emission is not observed for indium particles that require super heating and exhibit elevated melting temperatures. The acoustic emission work corroborates previously proposed relaxation mechanisms from prior internal friction studies and that the superheat observed for melting of these micron-sized particles is a result of matrix constraint.Comment: Presented at "Atomistic Effects in Migrating Interphase Interfaces - Recent Progress and Future Study" TMS 201

    Pembrolizumab in microsatellite instability high or mismatch repair deficient cancers: updated analysis from the phase II KEYNOTE-158 study

    Get PDF
    Background: Pembrolizumab demonstrated durable antitumor activity in 233 patients with previously treated advanced microsatellite instability high (MSI-H) or mismatch repair deficient (dMMR) advanced solid tumors in the phase II multicohort KEYNOTE-158 (NCT02628067) study. Herein, we report safety and efficacy outcomes with longer follow-up for more patients with previously treated advanced MSI-H/dMMR noncolorectal cancers who were included in cohort K of the KEYNOTE-158 (NCT02628067) study. Patients and methods: Eligible patients with previously treated advanced noncolorectal MSI-H/dMMR solid tumors, measurable disease as per RECIST v1.1, and Eastern Cooperative Oncology Group performance status of 0 or 1 received pembrolizumab 200 mg Q3W for 35 cycles or until disease progression or unacceptable toxicity. The primary endpoint was objective response rate (ORR) as per RECIST v1.1 by independent central radiologic review. Results: Three hundred and fifty-one patients with various tumor types were enrolled in KEYNOTE-158 cohort K. The most common tumor types were endometrial (22.5%), gastric (14.5%), and small intestine (7.4%). Median time from first dose to database cut-off (5 October 2020) was 37.5 months (range, 0.2-55.6 months). ORR among 321 patients in the efficacy population (patients who received ≥1 dose of pembrolizumab enrolled ≥6 months before the data cut-off date) was 30.8% [95% confidence interval (CI) 25.8% to 36.2%]. Median duration of response was 47.5 months (range, 2.1+ to 51.1+ months; ‘+’ indicates no progressive disease by the time of last disease assessment). Median progression-free survival was 3.5 months (95% CI 2.3-4.2 months) and median overall survival was 20.1 months (95% CI 14.1-27.1 months). Treatment-related adverse events (AEs) occurred in 227 patients (64.7%). Grade 3-4 treatment-related AEs occurred in 39 patients (11.1%); 3 (0.9%) had grade 5 treatment-related AEs (myocarditis, pneumonia, and Guillain–Barre syndrome, n = 1 each). Conclusions: Pembrolizumab demonstrated clinically meaningful and durable benefit, with a high ORR of 30.8%, long median duration of response of 47.5 months, and manageable safety across a range of heavily pretreated, advanced MSI-H/dMMR noncolorectal cancers, providing support for use of pembrolizumab in this setting

    Health-related quality of life in patients treated with pembrolizumab for microsatellite instability–high/mismatch repair–deficient advanced solid tumours: Results from the KEYNOTE-158 study

    Get PDF
    Background: In the KEYNOTE-158 study (NCT02628067), pembrolizumab showed a high objective response rate and durable clinical benefit for patients with previously treated, unresectable/metastatic microsatellite instability−high (MSI-H)/mismatch repair‒deficient (dMMR) non-colorectal solid tumours. We present health-related quality of life (HRQoL) results from the MSI-H/dMMR population (cohort K). Patients and methods: Eligible patients had previously treated MSI-H/dMMR advanced non-colorectal solid tumours, measurable disease per RECIST v1.1, and ECOG performance status ≤1. Patients received pembrolizumab 200 mg Q3W for 35 cycles (2 years). The EORTC Quality of Life Questionnaire (QLQ-C30) and EQ-5D-3L were administered at baseline, at regular intervals throughout treatment, and 30 days after treatment discontinuation. Prespecified analyses (exploratory endpoints) included the magnitude of change from baseline to post-baseline timepoints in all patients and by the best overall response for QLQ-C30 global health status (GHS)/QoL, QLQ-C30 functional/symptom scales/items, and EQ-5D-3L visual analogue scale (VAS) score. Results: At data cutoff (October 5, 2020), 351 patients were enrolled, of whom 311 and 315 completed baseline QLQ-C30 and EQ-5D-3L questionnaires, respectively. QLQ-C30 GHS/QoL scores improved from baseline to week 9 (mean [95% CI] change, 3.07 [0.19–5.94]), then remained stable or improved by week 111, with greater improvements observed in patients with a best response of complete response (CR) or partial response (PR) (10.85 [6.36–15.35]). Patients with CR/PR showed improvements in physical (5.58 [1.91–9.25]), role (9.88 [3.80–15.97]), emotional (5.62 [1.56–9.68]), and social (8.33 [2.70–13.97]) functioning, and stable cognitive functioning (1.74 [−1.45 to 4.94]). Conclusions: Pembrolizumab generally improved or preserved HRQoL in patients with previously treated MSI-H/dMMR advanced non-colorectal solid tumours

    Age-related changes in global motion coherence: conflicting haemodynamic and perceptual responses

    Get PDF
    Our aim was to use both behavioural and neuroimaging data to identify indicators of perceptual decline in motion processing. We employed a global motion coherence task and functional Near Infrared Spectroscopy (fNIRS). Healthy adults (n = 72, 18-85) were recruited into the following groups: young (n = 28, mean age = 28), middle-aged (n = 22, mean age = 50), and older adults (n = 23, mean age = 70). Participants were assessed on their motion coherence thresholds at 3 different speeds using a psychophysical design. As expected, we report age group differences in motion processing as demonstrated by higher motion coherence thresholds in older adults. Crucially, we add correlational data showing that global motion perception declines linearly as a function of age. The associated fNIRS recordings provide a clear physiological correlate of global motion perception. The crux of this study lies in the robust linear correlation between age and haemodynamic response for both measures of oxygenation. We hypothesise that there is an increase in neural recruitment, necessitating an increase in metabolic need and blood flow, which presents as a higher oxygenated haemoglobin response. We report age-related changes in motion perception with poorer behavioural performance (high motion coherence thresholds) associated with an increased haemodynamic response
    • …
    corecore