551 research outputs found

    Two steps Natural Actor Critic Learning for Underwater Cable Tracking

    Get PDF
    Abstract-This paper proposes a field application of a highlevel Reinforcement Learning (RL) control system for solving the action selection problem of an autonomous robot in a cable tracking task. The underwater vehicle ICT IN EU AU V learns to perform a visual based cable tracking task in a two step learning process. First, a policy is computed by means of simulation where a hydrodynamic model of the vehicle simulates the cable following task. Once the simulated results are accurate enough, in a second step, the learnedin-simulation policy is transferred to the vehicle where the learning procedure continues in a real environment, improving the initial policy. The natural actor-critic (NAC) algorithm has been selected to solve the problem in both steps. This algorithm aims to take advantage of policy gradient and value function techniques for fast convergence. Actor's policy gradient gives convergence guarantees under function approximation and partial observability while critic's value function reduces variance of the estimates update improving the convergence process

    Adaptive and learning-based formation control of swarm robots

    Get PDF
    Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation

    Physics-based Machine Learning Methods for Control and Sensing in Fish-like Robots

    Get PDF
    Underwater robots are important for the construction and maintenance of underwater infrastructure, underwater resource extraction, and defense. However, they currently fall far behind biological swimmers such as fish in agility, efficiency, and sensing capabilities. As a result, mimicking the capabilities of biological swimmers has become an area of significant research interest. In this work, we focus specifically on improving the control and sensing capabilities of fish-like robots. Our control work focuses on using the Chaplygin sleigh, a two-dimensional nonholonomic system which has been used to model fish-like swimming, as part of a curriculum to train a reinforcement learning agent to control a fish-like robot to track a prescribed path. The agent is first trained on the Chaplygin sleigh model, which is not an accurate model of the swimming robot but crucially has similar physics; having learned these physics, the agent is then trained on a simulated swimming robot, resulting in faster convergence compared to only training on the simulated swimming robot. Our sensing work separately considers using kinematic data (proprioceptive sensing) and using surface pressure sensors. The effect of a swimming body\u27s internal dynamics on proprioceptive sensing is investigated by collecting time series of kinematic data of both a flexible and rigid body in a water tunnel behind a moving obstacle performing different motions, and using machine learning to classify the motion of the upstream obstacle. This revealed that the flexible body could more effectively classify the motion of the obstacle, even if only one if its internal states is used. We also consider the problem of using time series data from a `lateral line\u27 of pressure sensors on a fish-like body to estimate the position of an upstream obstacle. Feature extraction from the pressure data is attempted with a state-of-the-art convolutional neural network (CNN), and this is compared with using the dominant modes of a Koopman operator constructed on the data as features. It is found that both sets of features achieve similar estimation performance using a dense neural network to perform the estimation. This highlights the potential of the Koopman modes as an interpretable alternative to CNNs for high-dimensional time series. This problem is also extended to inferring the time evolution of the flow field surrounding the body using the same surface measurements, which is performed by first estimating the dominant Koopman modes of the surrounding flow, and using those modes to perform a flow reconstruction. This strategy of mapping from surface to field modes is more interpretable than directly constructing a mapping of unsteady fluid states, and is found to be effective at reconstructing the flow. The sensing frameworks developed as a result of this work allow better awareness of obstacles and flow patterns, knowledge which can inform the generation of paths through the fluid that the developed controller can track, contributing to the autonomy of swimming robots in challenging environments

    Path Planning and Control of UAV using Machine Learning and Deep Reinforcement Learning Techniques

    Get PDF
    Uncrewed Aerial Vehicles (UAVs) are playing an increasingly signifcant role in modern life. In the past decades, lots of commercial and scientifc communities all over the world have been developing autonomous techniques of UAV for a broad range of applications, such as forest fre monitoring, parcel delivery, disaster rescue, natural resource exploration, and surveillance. This brings a large number of opportunities and challenges for UAVs to improve their abilities in path planning, motion control and fault-tolerant control (FTC) directions. Meanwhile, due to the powerful decisionmaking, adaptive learning and pattern recognition capabilities of machine learning (ML) and deep reinforcement learning (DRL), the use of ML and DRL have been developing rapidly and obtain major achievement in a variety of applications. However, there is not many researches on the ML and DRl in the feld of motion control and real-time path planning of UAVs. This thesis focuses on the development of ML and DRL in the path planning, motion control and FTC of UAVs. A number of ontributions pertaining to the state space defnition, reward function design and training method improvement have been made in this thesis, which improve the effectiveness and efciency of applying DRL in UAV motion control problems. In addition to the control problems, this thesis also presents real-time path planning contributions, including relative state space defnition and human pedestrian inspired reward function, which provide a reliable and effective solution of the real-time path planning in a complex environment

    Mobile Robots Navigation

    Get PDF
    Mobile robots navigation includes different interrelated activities: (i) perception, as obtaining and interpreting sensory information; (ii) exploration, as the strategy that guides the robot to select the next direction to go; (iii) mapping, involving the construction of a spatial representation by using the sensory information perceived; (iv) localization, as the strategy to estimate the robot position within the spatial map; (v) path planning, as the strategy to find a path towards a goal location being optimal or not; and (vi) path execution, where motor actions are determined and adapted to environmental changes. The book addresses those activities by integrating results from the research work of several authors all over the world. Research cases are documented in 32 chapters organized within 7 categories next described

    Learning Search Strategies from Human Demonstrations

    Get PDF
    Decision making and planning with partial state information is a problem faced by all forms of intelligent entities. The formulation of a problem under partial state information leads to an exorbitant set of choices with associated probabilistic outcomes making its resolution difficult when using traditional planning methods. Human beings have acquired the ability of acting under uncertainty through education and self-learning. Transferring our know-how to artificial agents and robots will make it faster for them to learn and even improve upon us in tasks in which incomplete knowledge is available, which is the objective of this thesis. We model how humans reason with respect to their beliefs and transfer this knowledge in the form of a parameterised policy, following a Programming by Demonstration framework, to a robot apprentice for two spatial navigation tasks: the first task consists of localising a wooden block on a table and for the second task a power socket must be found and connected. In both tasks the human teacher and robot apprentice only rely on haptic and tactile information. We model the human and robot's beliefs by a probability density function which we update through recursive Bayesian state space estimation. To model the reasoning processes of human subjects performing the search tasks we learn a generative joint distribution over beliefs and actions (end-effector velocities) which were recorded during the executions of the task. For the first search task the direct mapping from belief to actions is learned whilst for the second task we incorporate a cost function used to adapt the policy parameters in a Reinforcement Learning framework and show a considerable improvement over solely learning the behaviour with respect to the distance taken to accomplish the task. Both search tasks above can be considered as active localisation as the uncertainty originates only from the position of the agent in the world. We consider searches in which both the position of the robot and features of the environment are uncertain. Given the unstructured nature of the belief a histogram parametrisation of the joint distribution of the robots position and features is necessary. However, naively doing so becomes quickly intractable as the space and time complexity is exponential. We demonstrate that by only parametrising the marginals and by memorising the parameters of the measurement likelihood functions we can recover the exact same solution as the naive parametrisations at a cost which is linear in space and time complexity

    Towards Sensorimotor Coupling of a Spiking Neural Network and Deep Reinforcement Learning for Robotics Application

    Get PDF
    Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the great achievements of deep reinforcement learning in various domains including finance,medicine, healthcare, video games, robotics and computer vision.Deep neural network was started with multi-layer perceptron (1stgeneration) and developed to deep neural networks (2ndgeneration)and it is moving forward to spiking neural networks which are knownas3rdgeneration of neural networks. Spiking neural networks aim to bridge the gap between neuroscience and machine learning, using biologically-realistic models of neurons to carry out computation. In this thesis, we first provide a comprehensive review on both spiking neural networks and deep reinforcement learning with emphasis on robotic applications. Then we will demonstrate how to develop a robotics application for context-aware scene understanding to perform sensorimotor coupling. Our system contains two modules corresponding to scene understanding and robotic navigation. The first module is implemented as a spiking neural network to carry out semantic segmentation to understand the scene in front of the robot. The second module provides a high-level navigation command to robot, which is considered as an agent and implemented by online reinforcement learning. The module was implemented with biologically plausible local learning rule that allows the agent to adopt quickly to the environment. To benchmark our system, we have tested the first module on Oxford-IIIT Pet dataset and the second module on the custom-made Gym environment. Our experimental results have proven that our system is able present the competitive results with deep neural network in segmentation task and adopts quickly to the environment

    A Survey on Energy Optimization Techniques in UAV-Based Cellular Networks: From Conventional to Machine Learning Approaches

    Get PDF
    Wireless communication networks have been witnessing an unprecedented demand due to the increasing number of connected devices and emerging bandwidth-hungry applications. Albeit many competent technologies for capacity enhancement purposes, such as millimeter wave communications and network densification, there is still room and need for further capacity enhancement in wireless communication networks, especially for the cases of unusual people gatherings, such as sport competitions, musical concerts, etc. Unmanned aerial vehicles (UAVs) have been identified as one of the promising options to enhance the capacity due to their easy implementation, pop up fashion operation, and cost-effective nature. The main idea is to deploy base stations on UAVs and operate them as flying base stations, thereby bringing additional capacity to where it is needed. However, because the UAVs mostly have limited energy storage, their energy consumption must be optimized to increase flight time. In this survey, we investigate different energy optimization techniques with a top-level classification in terms of the optimization algorithm employed; conventional and machine learning (ML). Such classification helps understand the state of the art and the current trend in terms of methodology. In this regard, various optimization techniques are identified from the related literature, and they are presented under the above mentioned classes of employed optimization methods. In addition, for the purpose of completeness, we include a brief tutorial on the optimization methods and power supply and charging mechanisms of UAVs. Moreover, novel concepts, such as reflective intelligent surfaces and landing spot optimization, are also covered to capture the latest trend in the literature.Comment: 41 pages, 5 Figures, 6 Tables. Submitted to Open Journal of Communications Society (OJ-COMS
    • …
    corecore