492 research outputs found

    Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

    Get PDF
    Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks. Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply; this is because agents, even in cooperative games, could have conflicting directions of policy updates. As a result, achieving a guaranteed improvement on the joint policy where each agent acts individually remains an open challenge. In this paper, we extend the theory of trust region learning to cooperative MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme. Based on these, we develop Heterogeneous-Agent Trust Region Policy Optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy Optimisation (HAPPO) algorithms. Unlike many existing MARL algorithms, HATRPO/HAPPO do not need agents to share parameters, nor do they need any restrictive assumptions on decomposibility of the joint value function. Most importantly, we justify in theory the monotonic improvement property of HATRPO/HAPPO. We evaluate the proposed methods on a series of Multi-Agent MuJoCo and StarCraftII tasks. Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, thereby establishing a new state of the art

    Settling the Variance of Multi-Agent Policy Gradients

    Get PDF
    Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents’ explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin. Code is released at https://github.com/morning9393/Optimal-Baseline-for-Multi-agent-Policy-Gradients

    Psychologizing indexes of societal progress: Accounting for cultural diversity in preferred developmental pathways

    Get PDF
    Since the Second World War, the dominating paradigm of societal development has focused on economic growth. While economic growth has improved the quality of human life in a variety of ways, we posit that the identification of economic growth as the primary societal goal is culture-blind because preferences for developmental pathways likely vary between societies. We argue that the cultural diversity of developmental goals and the pathways leading to these goals could be reflected in a culturally sensitive approach to assessing societal development. For the vast majority of post-materialistic societies, it is an urgent necessity to prepare culturally sensitive compasses on how to develop next, and to start conceptualizing growth in a more nuanced and culturally responsive way. Furthermore, we propose that cultural sensitivity in measuring societal growth could also be applied to existing development indicators (e.g. the Human Development Index). We call for cultural researchers, in cooperation with development economists and other social scientists, to prepare a new cultural map of developmental goals, and to create and adapt development indexes that are more culturally sensitive. This innovation could ultimately help social planners understand the diverse pathways of development and assess the degree to which societies are progressing in a self-determined and indigenously valued manner.info:eu-repo/semantics/acceptedVersio

    Nurses\u27 Alumnae Association Bulletin - Volume 16 Number 1

    Get PDF
    Alumnae Notes ANA Biennial Convention Cancer of the Cervix, Uterus and Ovaries Committee Reports Digest of Alumnae Association Meetings Greetings from Miss Childs Greetings from the President Graduation Awards - 1950 Isotopes and the Nurse - Dr. T.P. Eberhard Marriages Necrology New Arrivals Nursing Care in Heart Disease with Pulmonary Infarction Nursing Care of a Mitral Commissurotomy Physical Advances at Jefferson - 1950 Policies of the Private Duty Nurses\u27 Registry Staff Activities, 1950-1951 Students\u27 Corner The Department of Surgical Research - Drs. Templeton and Gibbon White Haven and Barton Memorial Division

    Analysis of acoustic emission during the melting of embedded indium particles in an aluminum matrix: a study of plastic strain accommodation during phase transformation

    Full text link
    Acoustic emission is used here to study melting and solidification of embedded indium particles in the size range of 0.2 to 3 um in diameter and to show that dislocation generation occurs in the aluminum matrix to accommodate a 2.5% volume change. The volume averaged acoustic energy produced by indium particle melting is similar to that reported for bainite formation upon continuous cooling. A mechanism of prismatic loop generation is proposed to accommodate the volume change and an upper limit to the geometrically necessary increase in dislocation density is calculated as 4.1 x 10^9 cm^-2 for the Al-17In alloy. Thermomechanical processing is also used to change the size and distribution of the indium particles within the aluminum matrix. Dislocation generation with accompanied acoustic emission occurs when the melting indium particles are associated with grain boundaries or upon solidification where the solid-liquid interfaces act as free surfaces to facilitate dislocation generation. Acoustic emission is not observed for indium particles that require super heating and exhibit elevated melting temperatures. The acoustic emission work corroborates previously proposed relaxation mechanisms from prior internal friction studies and that the superheat observed for melting of these micron-sized particles is a result of matrix constraint.Comment: Presented at "Atomistic Effects in Migrating Interphase Interfaces - Recent Progress and Future Study" TMS 201

    Age-related changes in global motion coherence: conflicting haemodynamic and perceptual responses

    Get PDF
    Our aim was to use both behavioural and neuroimaging data to identify indicators of perceptual decline in motion processing. We employed a global motion coherence task and functional Near Infrared Spectroscopy (fNIRS). Healthy adults (n = 72, 18-85) were recruited into the following groups: young (n = 28, mean age = 28), middle-aged (n = 22, mean age = 50), and older adults (n = 23, mean age = 70). Participants were assessed on their motion coherence thresholds at 3 different speeds using a psychophysical design. As expected, we report age group differences in motion processing as demonstrated by higher motion coherence thresholds in older adults. Crucially, we add correlational data showing that global motion perception declines linearly as a function of age. The associated fNIRS recordings provide a clear physiological correlate of global motion perception. The crux of this study lies in the robust linear correlation between age and haemodynamic response for both measures of oxygenation. We hypothesise that there is an increase in neural recruitment, necessitating an increase in metabolic need and blood flow, which presents as a higher oxygenated haemoglobin response. We report age-related changes in motion perception with poorer behavioural performance (high motion coherence thresholds) associated with an increased haemodynamic response

    Introduction to a culturally sensitive measure of well-being: Combining life satisfaction and interdependent happiness across 49 different cultures

    Get PDF
    How can one conclude that well-being is higher in country A than country B, when well-being is being measured according to the way people in country A think about well-being? We address this issue by proposing a new culturally sensitive method to comparing societal levels of well-being. We support our reasoning with data on life satisfaction and interdependent happiness focusing on individual and family, collected mostly from students, across forty-nine countries. We demonstrate that the relative idealization of the two types of well-being varies across cultural contexts and are associated with culturally different models of selfhood. Furthermore, we show that rankings of societal well-being based on life satisfaction tend to underestimate the contribution from interdependent happiness. We introduce a new culturally sensitive method for calculating societal well-being, and examine its construct validity by testing for associations with the experience of emotions and with individualism-collectivism. This new culturally sensitive approach represents a slight, yet important improvement in measuring well-being.info:eu-repo/semantics/publishedVersio

    Feasibility of hydraulic separation in a novel anaerobic-anoxic upflow reactor for biological nutrient removal

    Get PDF
    ABSTRACT : This contribution deals with a novel anaerobic-anoxic reactor for biological nutrient removal (BNR) from wastewater, termed AnoxAn. In the AnoxAn reactor, the anaerobic and anoxic zones for phosphate removal and denitrification are integrated in a single continuous upflow sludge blanket reactor, aiming at high compactness and efficiency. Its application is envisaged in those cases where retrofitting of existing wastewater treatment plants for BNR, or the construction of new ones, is limited by the available surface area. The environmental conditions are vertically divided up inside the reactor with the anaerobic zone at the bottom and the anoxic zone above. The capability of the AnoxAn configuration to establish two hydraulically separated zones inside the single reactor was assessed by means of hydraulic characterization experiments and model simulations. Residence time distribution (RTD) experiments in clean water were performed in a bench-scale (48.4 L) AnoxAn prototype. The required hydraulic separation between the anaerobic and anoxic zones, as well as adequate mixing in the individual zones, was obtained through selected mixing devices. The observed behaviour was described by a hydraulic model consisting of continuous stirred tank reactors and plug-flow reactors. The impact of the denitrification process in the anoxic zone on the hydraulic separation was subsequently evaluated through model simulations. The desired hydraulic behaviour proved feasible, involving little mixing between the anaerobic and anoxic zones (mixing flowrate 40.2% of influent flowrate) and negligible nitrate concentration in the anaerobic zone (less than 0.1 mgN L-1) when denitrification was considered
    • …
    corecore