91 research outputs found
High fidelity progressive reinforcement learning for agile maneuvering UAVs
In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work
vrAIn: a deep learning approach tailoring computing and radio resources in virtualized RANs
Proceeding of: 25th Annual International Conference on Mobile Computing and Networking (MobiCom'19), October 21-25, 2019, Los Cabos, Mexico.The virtualization of radio access networks (vRAN) is the
last milestone in the NFV revolution. However, the complex
dependencies between computing and radio resources make
vRAN resource control particularly daunting. We present
vrAIn, a dynamic resource controller for vRANs based on
deep reinforcement learning. First, we use an autoencoder
to project high-dimensional context data (traffic and signal
quality patterns) into a latent representation. Then, we use a
deep deterministic policy gradient (DDPG) algorithm based
on an actor-critic neural network structure and a classifier
to map (encoded) contexts into resource control decisions.
We have implemented vrAIn using an open-source LTE
stack over different platforms. Our results show that vrAIn
successfully derives appropriate compute and radio control
actions irrespective of the platform and context: (i) it provides
savings in computational capacity of up to 30% over
CPU-unaware methods; (ii) it improves the probability of
meeting QoS targets by 25% over static allocation policies
using similar CPU resources in average; (iii) upon CPU capacity
shortage, it improves throughput performance by 25%
over state-of-the-art schemes; and (iv) it performs close to optimal
policies resulting from an offline oracle. To the best of
our knowledge, this is the first work that thoroughly studies
the computational behavior of vRANs, and the first approach
to a model-free solution that does not need to assume any
particular vRAN platform or system conditions.The work of
University Carlos III of Madrid was supported by H2020 5GMoNArch
project (grant agreement no. 761445) and H2020
5G-TOURS project (grant agreement no. 856950). The work
of NEC Laboratories Europe was supported by H2020 5GTRANSFORMER
project (grant agreement no. 761536) and
5GROWTH project (grant agreement no. 856709). The work
of University of Cartagena was supported by Grant AEI/FEDER
TEC2016-76465-C2-1-R (AIM) and Grant FPU14/03701.Publicad
Vector-based navigation using grid-like representations in artificial agents
Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex. Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space and is critical for integrating self-motion (path integration) and planning direct trajectories to goals (vector-based navigation). Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types12. We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments—optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments
Plasmin Generation Potential and Recanalization in Acute Ischaemic Stroke; an Observational Cohort Study of Stroke Biobank Samples.
Rationale: More than half of patients who receive thrombolysis for acute ischaemic stroke fail to recanalize. Elucidating biological factors which predict recanalization could identify therapeutic targets for increasing thrombolysis success. Hypothesis: We hypothesize that individual patient plasmin potential, as measured by in vitro response to recombinant tissue-type plasminogen activator (rt-PA), is a biomarker of rt-PA response, and that patients with greater plasmin response are more likely to recanalize early. Methods: This study will use historical samples from the Barcelona Stroke Thrombolysis Biobank, comprised of 350 pre-thrombolysis plasma samples from ischaemic stroke patients who received serial transcranial-Doppler (TCD) measurements before and after thrombolysis. The plasmin potential of each patient will be measured using the level of plasmin-antiplasmin complex (PAP) generated after in-vitro addition of rt-PA. Levels of antiplasmin, plasminogen, t-PA activity, and PAI-1 activity will also be determined. Association between plasmin potential variables and time to recanalization [assessed on serial TCD using the thrombolysis in brain ischemia (TIBI) score] will be assessed using Cox proportional hazards models, adjusted for potential confounders. Outcomes: The primary outcome will be time to recanalization detected by TCD (defined as TIBI ≥4). Secondary outcomes will be recanalization within 6-h and recanalization and/or haemorrhagic transformation at 24-h. This analysis will utilize an expanded cohort including ~120 patients from the Targeting Optimal Thrombolysis Outcomes (TOTO) study. Discussion: If association between proteolytic response to rt-PA and recanalization is confirmed, future clinical treatment may customize thrombolytic therapy to maximize outcomes and minimize adverse effects for individual patients
Learning to Communicate: A Machine Learning Framework for Heterogeneous Multi-Agent Robotic Systems
We present a machine learning framework for multi-agent systems to learn both
the optimal policy for maximizing the rewards and the encoding of the high
dimensional visual observation. The encoding is useful for sharing local visual
observations with other agents under communication resource constraints. The
actor-encoder encodes the raw images and chooses an action based on local
observations and messages sent by the other agents. The machine learning agent
generates not only an actuator command to the physical device, but also a
communication message to the other agents. We formulate a reinforcement
learning problem, which extends the action space to consider the communication
action as well. The feasibility of the reinforcement learning framework is
demonstrated using a 3D simulation environment with two collaborating agents.
The environment provides realistic visual observations to be used and shared
between the two agents.Comment: AIAA SciTech 201
Design and evaluation of advanced intelligent flight controllers
Reinforcement learning based methods could be feasible of solving adaptive optimal control problems for nonlinear dynamical systems. This work presents a proof of concept for applying reinforcement learning based methods to robust and adaptive flight control tasks. A framework for designing and examining these methods is introduced by means of the open research civil aircraft model (RCAM) and optimality criteria. A state-of-the-art robust flight controller - the incremental nonlinear dynamic inversion (INDI) controller - serves as a reference controller. Two intelligent control methods are introduced and examined. The deep deterministic policy gradient (DDPG) controller is selected as a promising actor critic reinforcement learning method that currently gains much attraction in the field of robotics. In addition, an adaptive version of a proportional-integral-derivative (PID) controller, the PID neural network (PIDNN) controller, is selected as the second method. The results show that all controllers are able to control the aircraft model. Moreover, the PIDNN controller exhibits improved reference tracking if a good initial guess of its weights is available. In turn, the DDPG algorithm is able to control the nonlinear aircraft model while minimizing a multi-objective value function. This work provides insight into the usability of selected intelligent controllers as flight control functions as well as a comparison to state-of-the-art flight control functions
Knowledge Hub on the Integrated Assessment of Chemical Contaminants and their Effects on the Marine Environment
In a time of environmental awareness, spurred on by the possibility that our world is threatened by climate change, it is important to remember that there are other anthropogenic pressures, which are also essential for addressing the protection of the marine and coastal environment. Pollution is a global, complex issue that contributes to biodiversity loss and poor environmental health and comes from the production and release of many of the synthetic chemicals that we use in our daily lives. Chemical contaminants are often underrepresented as a major contributor of environmental deterioration.
The Joint Programming Initiative Healthy and Productive Seas and Oceans (JPI Oceans) established in 2018 the JPI Oceans Knowledge Hub on the integrated assessment
of chemical contaminants and their effects on the marine environment. The purpose of the Knowledge Hub was to provide recommendations on how to improve the methodological basis for marine chemical status assessment.
The work has resulted in the following policy paper which focuses on improving the efficiency and implementation of integrated assessment methodology of effects of chemicals of emerging concern. Substantial additional knowledge of biological effects is needed to achieve Good Environmental Status (GES) of our oceans and coastal areas. The Knowledge Hub is represented by highly skilled scientists and policy makers, appointed by the JPI Oceans Management Board, to ensure that the recommendations provided are useful for policy making
Plasmin generation potential and recanalization in acute ischaemic stroke; an observational cohort study of stroke biobank samples
Rationale: More than half of patients who receive thrombolysis for acute ischaemic stroke fail to recanalize. Elucidating biological factors which predict recanalization could identify therapeutic targets for increasing thrombolysis success. Hypothesis: We hypothesize that individual patient plasmin potential, as measured by in vitro response to recombinant tissue-type plasminogen activator (rt-PA), is a biomarker of rt-PA response, and that patients with greater plasmin response are more likely to recanalize early. Methods: This study will use historical samples from the Barcelona Stroke Thrombolysis Biobank, comprised of 350 pre-thrombolysis plasma samples from ischaemic stroke patients who received serial transcranial-Doppler (TCD) measurements before and after thrombolysis. The plasmin potential of each patient will be measured using the level of plasmin-antiplasmin complex (PAP) generated after in-vitro addition of rt-PA. Levels of antiplasmin, plasminogen, t-PA activity, and PAI-1 activity will also be determined. Association between plasmin potential variables and time to recanalization [assessed on serial TCD using the thrombolysis in brain ischemia (TIBI) score] will be assessed using Cox proportional hazards models, adjusted for potential confounders. Outcomes: The primary outcome will be time to recanalization detected by TCD (defined as TIBI ≥4). Secondary outcomes will be recanalization within 6-h and recanalization and/or haemorrhagic transformation at 24-h. This analysis will utilize an expanded cohort including ~120 patients from the Targeting Optimal Thrombolysis Outcomes (TOTO) study. Discussion: If association between proteolytic response to rt-PA and recanalization is confirmed, future clinical treatment may customize thrombolytic therapy to maximize outcomes and minimize adverse effects for individual patients.Thomas Lillicrap … Timothy Kleinig … Simon Koblar, Monica Anne Hamilton-Bruce … et al
- …