2,385 research outputs found

    220502

    Get PDF
    Energy-harvesting-powered sensors are increasingly deployed beyond the reach of terrestrial gateways, where there is often no persistent power supply. Making use of the internet of drones (IoD) for data aggregation in such environments is a promising paradigm to enhance network scalability and connectivity. The flexibility of IoD and favorable line-of-sight connections between the drones and ground nodes are exploited to improve data reception at the drones. In this article, we discuss the challenges of online flight control of IoD, where data-driven neural networks can be tailored to design the trajectories and patrol speeds of the drones and their communication schedules, preventing buffer overflows at the ground nodes. In a small-scale IoD, a multi-agent deep reinforcement learning can be developed with long short-term memory to train the continuous flight control of IoD and data aggregation scheduling, where a joint action is generated for IoD via sharing the flight control decisions among the drones. In a large-scale IoD, sharing the flight control decisions in real-time can result in communication overheads and interference. In this case, deep reinforcement learning can be trained with the second-hand visiting experiences, where the drones learn the actions of each other based on historical scheduling records maintained at the ground nodes.This work was supported in part by the National Funds through FCT/MCTES (Portuguese Foundation for Science and Technology), within the CISTER Research Unit under Grant UIDP/UIDB/04234/2020, and in part by the National Funds through FCT, under CMU Portugal Partnership under Project CMU/TIC/0022/2019 (CRUAV).info:eu-repo/semantics/publishedVersio

    Utilising Assured Multi-Agent Reinforcement Learning within safety-critical scenarios

    Get PDF
    Multi-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in safety-critical domains can be problematic. To overcome this issue, we propose an Assured Multi-Agent Reinforcement Learning (AMARL) approach that uses a model checking technique called quantitative verification to provide formal guarantees of agent compliance with safety, performance, and other non-functional requirements during and after the reinforcement learning process. We demonstrate the applicability of our AMARL approach in three different patrolling navigation domains in which multi-agent systems must learn to visit key areas by using different types of reinforcement learning algorithms (temporal difference learning, game theory, and direct policy search). Furthermore, we compare the effectiveness of these algorithms when used in combination with and without our approach. Our extensive experiments with both homogeneous and heterogeneous multi-agent systems of different sizes show that the use of AMARL leads to safety requirements being consistently satisfied and to better overall results than standard reinforcement learning

    220401

    Get PDF
    Internet-of-Things (IoT) devices equipped with temperature and humidity sensors, and cameras are increasingly deployed to monitor remote and human-unfriendly areas, e.g., farmlands, forests, rural highways or electricity infrastructures. Aerial data aggregators, e.g., autonomous drones, provide a promising solution for collecting sensory data of the IoT devices in human-unfriendly environments, enhancing network scalability and connectivity. The flexibility of a drone and favourable line-of-sight connection between the drone and IoT devices can be exploited to improve data reception at the drone. This article first discusses challenges of the drone-assisted data aggregation in IoT networks, such as incomplete network knowledge at the drone, limited buffers of the IoT devices, and lossy wireless channels. Next, we investigate the feasibility of onboard deep reinforcement learning-based solutions to allow a drone to learn its cruise control and data collection schedule online. For deep reinforcement learning in a continuous operation domain, deep deterministic policy gradient (DDPG) is suitable to deliver effective joint cruise control and communication decision, using its outdated knowledge of the IoT devices and network states. A case study shows that the DDPG-based framework can take advantage of the continuous actions to substantially outperform existing non-learning-based alternatives.This work was supported in part by the National Funds through FCT/MCTES (Portuguese Foundation for Science and Technology), within the CISTER Research Unit under Grant UIDP/UIDB/04234/2020, and in part by the National Funds through FCT, under CMU Portugal Partnership under Project CMU/TIC/0022/2019 (CRUAV).info:eu-repo/semantics/publishedVersio

    Multiagent Learning Through Indirect Encoding

    Get PDF
    Designing a system of multiple, heterogeneous agents that cooperate to achieve a common goal is a difficult task, but it is also a common real-world problem. Multiagent learning addresses this problem by training the team to cooperate through a learning algorithm. However, most traditional approaches treat multiagent learning as a combination of multiple single-agent learning problems. This perspective leads to many inefficiencies in learning such as the problem of reinvention, whereby fundamental skills and policies that all agents should possess must be rediscovered independently for each team member. For example, in soccer, all the players know how to pass and kick the ball, but a traditional algorithm has no way to share such vital information because it has no way to relate the policies of agents to each other. In this dissertation a new approach to multiagent learning that seeks to address these issues is presented. This approach, called multiagent HyperNEAT, represents teams as a pattern of policies rather than individual agents. The main idea is that an agent’s location within a canonical team layout (such as a soccer team at the start of a game) tends to dictate its role within that team, called the policy geometry. For example, as soccer positions move from goal to center they become more offensive and less defensive, a concept that is compactly represented as a pattern. iii The first major contribution of this dissertation is a new method for evolving neural network controllers called HyperNEAT, which forms the foundation of the second contribution and primary focus of this work, multiagent HyperNEAT. Multiagent learning in this dissertation is investigated in predator-prey, room-clearing, and patrol domains, providing a real-world context for the approach. Interestingly, because the teams in multiagent HyperNEAT are represented as patterns they can scale up to an infinite number of multiagent policies that can be sampled from the policy geometry as needed. Thus the third contribution is a method for teams trained with multiagent HyperNEAT to dynamically scale their size without further learning. Fourth, the capabilities to both learn and scale in multiagent HyperNEAT are compared to the traditional multiagent SARSA(λ) approach in a comprehensive study. The fifth contribution is a method for efficiently learning and encoding multiple policies for each agent on a team to facilitate learning in multi-task domains. Finally, because there is significant interest in practical applications of multiagent learning, multiagent HyperNEAT is tested in a real-world military patrolling application with actual Khepera III robots. The ultimate goal is to provide a new perspective on multiagent learning and to demonstrate the practical benefits of training heterogeneous, scalable multiagent teams through generative encoding

    SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams

    Full text link
    Learning from demonstration (LfD) and imitation learning offer new paradigms for transferring task behavior to robots. A class of methods that enable such online learning require the robot to observe the task being performed and decompose the sensed streaming data into sequences of state-action pairs, which are then input to the methods. Thus, recognizing the state-action pairs correctly and quickly in sensed data is a crucial prerequisite for these methods. We present SA-Net a deep neural network architecture that recognizes state-action pairs from RGB-D data streams. SA-Net performed well in two diverse robotic applications of LfD -- one involving mobile ground robots and another involving a robotic manipulator -- which demonstrates that the architecture generalizes well to differing contexts. Comprehensive evaluations including deployment on a physical robot show that \sanet{} significantly improves on the accuracy of the previous method that utilizes traditional image processing and segmentation.Comment: (in press
    • …
    corecore