8 research outputs found

    Modified Structured Domain Randomization in a Synthetic Environment for Learning Algorithms

    Get PDF
    Deep Reinforcement Learning (DRL) has the capability to solve many complex tasks in robotics, self-driving cars, smart grids, finance, healthcare, and intelligent autonomous systems. During training, DRL agents interact freely with the environment to arrive at an inference model. Under real-world conditions this training creates difficulties of safety, cost, and time considerations. Training in synthetic environments helps overcome these difficulties, however, this only approximates real-world conditions resulting in a ‘reality gap’. The synthetic training of agents has proven advantageous but requires methods to bridge this reality gap. This work addressed this through a methodology which supports agent learning. A framework which incorporates a modifiable synthetic environment integrated with an unmodified DRL algorithm was used to train, test, and evaluate agents while using a modified Structured Domain Randomization (SDR+) technique. It was hypothesized that the application of environment domain randomizations (DR) during the learning process would allow the agent to learn variability and adapt accordingly. Experiments using the SDR+ technique included naturalistic and physical-based DR while applying the concept of context-aware elements (CAE) to guide and speed up agent training. Drone racing served as the use case. The experimental framework workflow generated the following results. First, a baseline was established by training and validating an agent in a generic synthetic environment void of DR and CAE. The agent was then tested in environments with DR which showed degradation of performance. This validated the reality gap phenomenon under synthetic conditions and established a metric for comparison. Second, an SDR+ agent was successfully trained and validated under various applications of DR and CAE. Ablation studies determine most DR and CAE effects applied had equivalent effects on agent performance. Under comparison, the SDR+ agent’s performance exceeded that of the baseline agent in every test where single or combined DR effects were applied. These tests indicated that the SDR+ agent’s performance did improve in environments with applied DR of the same order as received during training. The last result came from testing the SDR+ agent’s inference model in a completely new synthetic environment with more extreme and additional DR effects applied. The SDR+ agent’s performance was degraded to a point where it was inconclusive if generalization occurred in the form of learning to adapt to variations. If the agent’s navigational capabilities, control/feedback from the DRL algorithm, and the use of visual sensing were improved, it is assumed that future work could exhibit indications of generalization using the SDR+ technique

    Design of autonomous sustainable unmanned aerial vehicle - A novel approach to its dynamic wireless power transfer

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Electric UAVs are presently being used widely in civilian duties such as security, surveillance, and disaster relief. The use of Unmanned Aerial Vehicle (UAV) has increased dramatically over the past years in different areas/fields such as marines, mountains, wild environments. Nowadays, there are many electric UAVs development with fast computational speed and autonomous flying has been a reality by fusing many sensors such as camera tracking sensor, obstacle avoiding sensor, radar sensor, etc. But there is one main problem still not able to overcome which is power requirement for continuous autonomous operation. When the operation needs more power, but batteries can only give for 20 to 30 mins of flight time. These types of system are not reliable for long term civilian operation because we need to recharge or replace batteries by landing the craft every time when we want to continue the operation. The large batteries also take more loads on the UAV which is also not a reliable system. To eliminate these obstacles, there should a recharging wireless power station in ground which can transmit power to these small UAVs wirelessly for long term operation. There will be camera attached in the drone to detect and hover above the Wireless Power Transfer device which got receiving and transmitting station can be use with deep learning and sensor fusion techniques for more reliable flight operations. This thesis explores the use of dynamic wireless power to transfer energy using novel rotating WPT charging technique to the UAV with improved range, endurance, and average speed by giving extra hours in the air. The hypothesis that was created has a broad application beyond UAVs. The drone autonomous charging was mostly done by detecting a rotating WPT receiver connected to main power outlet that served as a recharging platform using deep neural vision capabilities. It was the purpose of the thesis to provide an alternative to traditional self-charging systems that relies purely on static WPT method and requires little distance between the vehicle and receiver. When the UAV camera detect the WPT receiving station, it will try to align and hover using onboard sensors for best power transfer efficiency. Since this strategy relied on traditional automatic drone landing technique, but the target is rotating all the time which needs smart approaches like deep learning and sensor fusion. The simulation environment was created and tested using robot operating system on a Linux operating system using a model of the custom-made drone. Experiments on the charging of the drone confirmed that the intelligent dynamic wireless power transfer (DWPT) method worked successfully while flying on air

    Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

    No full text
    In this paper, we propose a novel Deep Reinforcement Learning (DRL) algorithm which can navigate non-holonomic robots with continuous control in an unknown dynamic environment with moving obstacles. We call the approach MK-A3C (Memory and Knowledge-based Asynchronous Advantage Actor-Critic) for short. As its first component, MK-A3C builds a GRU-based memory neural network to enhance the robot’s capability for temporal reasoning. Robots without it tend to suffer from a lack of rationality in face of incomplete and noisy estimations for complex environments. Additionally, robots with certain memory ability endowed by MK-A3C can avoid local minima traps by estimating the environmental model. Secondly, MK-A3C combines the domain knowledge-based reward function and the transfer learning-based training task architecture, which can solve the non-convergence policies problems caused by sparse reward. These improvements of MK-A3C can efficiently navigate robots in unknown dynamic environments, and satisfy kinetic constraints while handling moving objects. Simulation experiments show that compared with existing methods, MK-A3C can realize successful robotic navigation in unknown and challenging environments by outputting continuous acceleration commands
    corecore