91 research outputs found

    High fidelity progressive reinforcement learning for agile maneuvering UAVs

    Get PDF
    In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work

    vrAIn: a deep learning approach tailoring computing and radio resources in virtualized RANs

    Get PDF
    Proceeding of: 25th Annual International Conference on Mobile Computing and Networking (MobiCom'19), October 21-25, 2019, Los Cabos, Mexico.The virtualization of radio access networks (vRAN) is the last milestone in the NFV revolution. However, the complex dependencies between computing and radio resources make vRAN resource control particularly daunting. We present vrAIn, a dynamic resource controller for vRANs based on deep reinforcement learning. First, we use an autoencoder to project high-dimensional context data (traffic and signal quality patterns) into a latent representation. Then, we use a deep deterministic policy gradient (DDPG) algorithm based on an actor-critic neural network structure and a classifier to map (encoded) contexts into resource control decisions. We have implemented vrAIn using an open-source LTE stack over different platforms. Our results show that vrAIn successfully derives appropriate compute and radio control actions irrespective of the platform and context: (i) it provides savings in computational capacity of up to 30% over CPU-unaware methods; (ii) it improves the probability of meeting QoS targets by 25% over static allocation policies using similar CPU resources in average; (iii) upon CPU capacity shortage, it improves throughput performance by 25% over state-of-the-art schemes; and (iv) it performs close to optimal policies resulting from an offline oracle. To the best of our knowledge, this is the first work that thoroughly studies the computational behavior of vRANs, and the first approach to a model-free solution that does not need to assume any particular vRAN platform or system conditions.The work of University Carlos III of Madrid was supported by H2020 5GMoNArch project (grant agreement no. 761445) and H2020 5G-TOURS project (grant agreement no. 856950). The work of NEC Laboratories Europe was supported by H2020 5GTRANSFORMER project (grant agreement no. 761536) and 5GROWTH project (grant agreement no. 856709). The work of University of Cartagena was supported by Grant AEI/FEDER TEC2016-76465-C2-1-R (AIM) and Grant FPU14/03701.Publicad

    Vector-based navigation using grid-like representations in artificial agents

    Get PDF
    Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex. Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space and is critical for integrating self-motion (path integration) and planning direct trajectories to goals (vector-based navigation). Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types12. We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments—optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments

    Plasmin Generation Potential and Recanalization in Acute Ischaemic Stroke; an Observational Cohort Study of Stroke Biobank Samples.

    Full text link
    Rationale: More than half of patients who receive thrombolysis for acute ischaemic stroke fail to recanalize. Elucidating biological factors which predict recanalization could identify therapeutic targets for increasing thrombolysis success. Hypothesis: We hypothesize that individual patient plasmin potential, as measured by in vitro response to recombinant tissue-type plasminogen activator (rt-PA), is a biomarker of rt-PA response, and that patients with greater plasmin response are more likely to recanalize early. Methods: This study will use historical samples from the Barcelona Stroke Thrombolysis Biobank, comprised of 350 pre-thrombolysis plasma samples from ischaemic stroke patients who received serial transcranial-Doppler (TCD) measurements before and after thrombolysis. The plasmin potential of each patient will be measured using the level of plasmin-antiplasmin complex (PAP) generated after in-vitro addition of rt-PA. Levels of antiplasmin, plasminogen, t-PA activity, and PAI-1 activity will also be determined. Association between plasmin potential variables and time to recanalization [assessed on serial TCD using the thrombolysis in brain ischemia (TIBI) score] will be assessed using Cox proportional hazards models, adjusted for potential confounders. Outcomes: The primary outcome will be time to recanalization detected by TCD (defined as TIBI ≥4). Secondary outcomes will be recanalization within 6-h and recanalization and/or haemorrhagic transformation at 24-h. This analysis will utilize an expanded cohort including ~120 patients from the Targeting Optimal Thrombolysis Outcomes (TOTO) study. Discussion: If association between proteolytic response to rt-PA and recanalization is confirmed, future clinical treatment may customize thrombolytic therapy to maximize outcomes and minimize adverse effects for individual patients

    Learning to Communicate: A Machine Learning Framework for Heterogeneous Multi-Agent Robotic Systems

    Full text link
    We present a machine learning framework for multi-agent systems to learn both the optimal policy for maximizing the rewards and the encoding of the high dimensional visual observation. The encoding is useful for sharing local visual observations with other agents under communication resource constraints. The actor-encoder encodes the raw images and chooses an action based on local observations and messages sent by the other agents. The machine learning agent generates not only an actuator command to the physical device, but also a communication message to the other agents. We formulate a reinforcement learning problem, which extends the action space to consider the communication action as well. The feasibility of the reinforcement learning framework is demonstrated using a 3D simulation environment with two collaborating agents. The environment provides realistic visual observations to be used and shared between the two agents.Comment: AIAA SciTech 201

    Design and evaluation of advanced intelligent flight controllers

    Get PDF
    Reinforcement learning based methods could be feasible of solving adaptive optimal control problems for nonlinear dynamical systems. This work presents a proof of concept for applying reinforcement learning based methods to robust and adaptive flight control tasks. A framework for designing and examining these methods is introduced by means of the open research civil aircraft model (RCAM) and optimality criteria. A state-of-the-art robust flight controller - the incremental nonlinear dynamic inversion (INDI) controller - serves as a reference controller. Two intelligent control methods are introduced and examined. The deep deterministic policy gradient (DDPG) controller is selected as a promising actor critic reinforcement learning method that currently gains much attraction in the field of robotics. In addition, an adaptive version of a proportional-integral-derivative (PID) controller, the PID neural network (PIDNN) controller, is selected as the second method. The results show that all controllers are able to control the aircraft model. Moreover, the PIDNN controller exhibits improved reference tracking if a good initial guess of its weights is available. In turn, the DDPG algorithm is able to control the nonlinear aircraft model while minimizing a multi-objective value function. This work provides insight into the usability of selected intelligent controllers as flight control functions as well as a comparison to state-of-the-art flight control functions

    Knowledge Hub on the Integrated Assessment of Chemical Contaminants and their Effects on the Marine Environment

    Get PDF
    In a time of environmental awareness, spurred on by the possibility that our world is threatened by climate change, it is important to remember that there are other anthropogenic pressures, which are also essential for addressing the protection of the marine and coastal environment. Pollution is a global, complex issue that contributes to biodiversity loss and poor environmental health and comes from the production and release of many of the synthetic chemicals that we use in our daily lives. Chemical contaminants are often underrepresented as a major contributor of environmental deterioration. The Joint Programming Initiative Healthy and Productive Seas and Oceans (JPI Oceans) established in 2018 the JPI Oceans Knowledge Hub on the integrated assessment of chemical contaminants and their effects on the marine environment. The purpose of the Knowledge Hub was to provide recommendations on how to improve the methodological basis for marine chemical status assessment. The work has resulted in the following policy paper which focuses on improving the efficiency and implementation of integrated assessment methodology of effects of chemicals of emerging concern. Substantial additional knowledge of biological effects is needed to achieve Good Environmental Status (GES) of our oceans and coastal areas. The Knowledge Hub is represented by highly skilled scientists and policy makers, appointed by the JPI Oceans Management Board, to ensure that the recommendations provided are useful for policy making

    Plasmin generation potential and recanalization in acute ischaemic stroke; an observational cohort study of stroke biobank samples

    Get PDF
    Rationale: More than half of patients who receive thrombolysis for acute ischaemic stroke fail to recanalize. Elucidating biological factors which predict recanalization could identify therapeutic targets for increasing thrombolysis success. Hypothesis: We hypothesize that individual patient plasmin potential, as measured by in vitro response to recombinant tissue-type plasminogen activator (rt-PA), is a biomarker of rt-PA response, and that patients with greater plasmin response are more likely to recanalize early. Methods: This study will use historical samples from the Barcelona Stroke Thrombolysis Biobank, comprised of 350 pre-thrombolysis plasma samples from ischaemic stroke patients who received serial transcranial-Doppler (TCD) measurements before and after thrombolysis. The plasmin potential of each patient will be measured using the level of plasmin-antiplasmin complex (PAP) generated after in-vitro addition of rt-PA. Levels of antiplasmin, plasminogen, t-PA activity, and PAI-1 activity will also be determined. Association between plasmin potential variables and time to recanalization [assessed on serial TCD using the thrombolysis in brain ischemia (TIBI) score] will be assessed using Cox proportional hazards models, adjusted for potential confounders. Outcomes: The primary outcome will be time to recanalization detected by TCD (defined as TIBI ≥4). Secondary outcomes will be recanalization within 6-h and recanalization and/or haemorrhagic transformation at 24-h. This analysis will utilize an expanded cohort including ~120 patients from the Targeting Optimal Thrombolysis Outcomes (TOTO) study. Discussion: If association between proteolytic response to rt-PA and recanalization is confirmed, future clinical treatment may customize thrombolytic therapy to maximize outcomes and minimize adverse effects for individual patients.Thomas Lillicrap … Timothy Kleinig … Simon Koblar, Monica Anne Hamilton-Bruce … et al
    • …
    corecore