129 research outputs found

    Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

    Get PDF
    Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorithm provides a generalization and unification of existing deep policy gradient techniques, has theoretical guarantees on the bias introduced by off-policy updates, and improves on the state-of-the-art model-free deep RL methods on a number of OpenAI Gym continuous control benchmarks

    High fidelity progressive reinforcement learning for agile maneuvering UAVs

    Get PDF
    In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work

    Catalyzing next-generation Artificial Intelligence through NeuroAI

    Get PDF
    Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts the focus from those capabilities like game playing and language that are especially well-developed or uniquely human to those capabilities - inherited from over 500 million years of evolution - that are shared with all animals. Building models that can pass the embodied Turing test will provide a roadmap for the next generation of AI

    Musicians as Researchers - Insight or Insanity?

    Get PDF
    In the current university context, many highly-proficient music performers enrol in higher education degrees by research. While at first glance those enrolled may seem to be moving from an area of expertise to an area of inexperience, in many cases the individual may in fact have already developed a range of research skills in the course of becoming highly proficient in their chosen field. Many expert musicians seek to further develop their craft through embarking on research degrees and/or seek inspiration through what they aim to discover. Research is a highly valued skill among many musicians pursuing fine music making. In this paper, we will investigate the motivations of musicians for enrolling in a higher degree by research, including the reasons why they choose research as a way of expanding their skills as performers, and the expected outcomes of their research studies

    Plasmin Generation Potential and Recanalization in Acute Ischaemic Stroke; an Observational Cohort Study of Stroke Biobank Samples.

    Full text link
    Rationale: More than half of patients who receive thrombolysis for acute ischaemic stroke fail to recanalize. Elucidating biological factors which predict recanalization could identify therapeutic targets for increasing thrombolysis success. Hypothesis: We hypothesize that individual patient plasmin potential, as measured by in vitro response to recombinant tissue-type plasminogen activator (rt-PA), is a biomarker of rt-PA response, and that patients with greater plasmin response are more likely to recanalize early. Methods: This study will use historical samples from the Barcelona Stroke Thrombolysis Biobank, comprised of 350 pre-thrombolysis plasma samples from ischaemic stroke patients who received serial transcranial-Doppler (TCD) measurements before and after thrombolysis. The plasmin potential of each patient will be measured using the level of plasmin-antiplasmin complex (PAP) generated after in-vitro addition of rt-PA. Levels of antiplasmin, plasminogen, t-PA activity, and PAI-1 activity will also be determined. Association between plasmin potential variables and time to recanalization [assessed on serial TCD using the thrombolysis in brain ischemia (TIBI) score] will be assessed using Cox proportional hazards models, adjusted for potential confounders. Outcomes: The primary outcome will be time to recanalization detected by TCD (defined as TIBI ≥4). Secondary outcomes will be recanalization within 6-h and recanalization and/or haemorrhagic transformation at 24-h. This analysis will utilize an expanded cohort including ~120 patients from the Targeting Optimal Thrombolysis Outcomes (TOTO) study. Discussion: If association between proteolytic response to rt-PA and recanalization is confirmed, future clinical treatment may customize thrombolytic therapy to maximize outcomes and minimize adverse effects for individual patients

    Knowledge Hub on the Integrated Assessment of Chemical Contaminants and their Effects on the Marine Environment

    Get PDF
    In a time of environmental awareness, spurred on by the possibility that our world is threatened by climate change, it is important to remember that there are other anthropogenic pressures, which are also essential for addressing the protection of the marine and coastal environment. Pollution is a global, complex issue that contributes to biodiversity loss and poor environmental health and comes from the production and release of many of the synthetic chemicals that we use in our daily lives. Chemical contaminants are often underrepresented as a major contributor of environmental deterioration. The Joint Programming Initiative Healthy and Productive Seas and Oceans (JPI Oceans) established in 2018 the JPI Oceans Knowledge Hub on the integrated assessment of chemical contaminants and their effects on the marine environment. The purpose of the Knowledge Hub was to provide recommendations on how to improve the methodological basis for marine chemical status assessment. The work has resulted in the following policy paper which focuses on improving the efficiency and implementation of integrated assessment methodology of effects of chemicals of emerging concern. Substantial additional knowledge of biological effects is needed to achieve Good Environmental Status (GES) of our oceans and coastal areas. The Knowledge Hub is represented by highly skilled scientists and policy makers, appointed by the JPI Oceans Management Board, to ensure that the recommendations provided are useful for policy making

    A call for action: Improve reporting of research studies to increase the scientific basis for regulatory decision-making

    Get PDF
    Publisher's version (útgefin grein)This is a call for action to scientific journals to introduce reporting requirements for toxicity and ecotoxicity studies. Such reporting requirements will support the use of peer‐reviewed research studies in regulatory decision‐making. Moreover, this could improve the reliability and reproducibility of published studies in general and make better use of the resources spent in research.Nordic Council of Minister

    Plasmin generation potential and recanalization in acute ischaemic stroke; an observational cohort study of stroke biobank samples

    Get PDF
    Rationale: More than half of patients who receive thrombolysis for acute ischaemic stroke fail to recanalize. Elucidating biological factors which predict recanalization could identify therapeutic targets for increasing thrombolysis success. Hypothesis: We hypothesize that individual patient plasmin potential, as measured by in vitro response to recombinant tissue-type plasminogen activator (rt-PA), is a biomarker of rt-PA response, and that patients with greater plasmin response are more likely to recanalize early. Methods: This study will use historical samples from the Barcelona Stroke Thrombolysis Biobank, comprised of 350 pre-thrombolysis plasma samples from ischaemic stroke patients who received serial transcranial-Doppler (TCD) measurements before and after thrombolysis. The plasmin potential of each patient will be measured using the level of plasmin-antiplasmin complex (PAP) generated after in-vitro addition of rt-PA. Levels of antiplasmin, plasminogen, t-PA activity, and PAI-1 activity will also be determined. Association between plasmin potential variables and time to recanalization [assessed on serial TCD using the thrombolysis in brain ischemia (TIBI) score] will be assessed using Cox proportional hazards models, adjusted for potential confounders. Outcomes: The primary outcome will be time to recanalization detected by TCD (defined as TIBI ≥4). Secondary outcomes will be recanalization within 6-h and recanalization and/or haemorrhagic transformation at 24-h. This analysis will utilize an expanded cohort including ~120 patients from the Targeting Optimal Thrombolysis Outcomes (TOTO) study. Discussion: If association between proteolytic response to rt-PA and recanalization is confirmed, future clinical treatment may customize thrombolytic therapy to maximize outcomes and minimize adverse effects for individual patients.Thomas Lillicrap … Timothy Kleinig … Simon Koblar, Monica Anne Hamilton-Bruce … et al

    Understanding hereditary diseases using the dog and human as companion model systems

    Get PDF
    Animal models are requisite for genetic dissection of, and improved treatment regimens for, human hereditary diseases. While several animals have been used in academic and industrial research, the primary model for dissection of hereditary diseases has been the many strains of the laboratory mouse. However, given its greater (than the mouse) genetic similarity to the human, high number of naturally occurring hereditary diseases, unique population structure, and the availability of the complete genome sequence, the purebred dog has emerged as a powerful model for study of diseases. The major advantage the dog provides is that it is afflicted with approximately 450 hereditary diseases, about half of which have remarkable clinical similarities to corresponding diseases of the human. In addition, humankind has a strong desire to cure diseases of the dog so these two facts make the dog an ideal clinical and genetic model. This review highlights several of these shared hereditary diseases. Specifically, the canine models discussed herein have played important roles in identification of causative genes and/or have been utilized in novel therapeutic approaches of interest to the dog and human
    corecore