648 research outputs found

    Deep Reinforcement Learning Attitude Control of Fixed-Wing UAVs Using Proximal Policy Optimization

    Full text link
    Contemporary autopilot systems for unmanned aerial vehicles (UAVs) are far more limited in their flight envelope as compared to experienced human pilots, thereby restricting the conditions UAVs can operate in and the types of missions they can accomplish autonomously. This paper proposes a deep reinforcement learning (DRL) controller to handle the nonlinear attitude control problem, enabling extended flight envelopes for fixed-wing UAVs. A proof-of-concept controller using the proximal policy optimization (PPO) algorithm is developed, and is shown to be capable of stabilizing a fixed-wing UAV from a large set of initial conditions to reference roll, pitch and airspeed values. The training process is outlined and key factors for its progression rate are considered, with the most important factor found to be limiting the number of variables in the observation vector, and including values for several previous time steps for these variables. The trained reinforcement learning (RL) controller is compared to a proportional-integral-derivative (PID) controller, and is found to converge in more cases than the PID controller, with comparable performance. Furthermore, the RL controller is shown to generalize well to unseen disturbances in the form of wind and turbulence, even in severe disturbance conditions.Comment: 11 pages, 3 figures, 2019 International Conference on Unmanned Aircraft Systems (ICUAS

    Compliant aerial manipulation.

    Get PDF
    The aerial manipulation is a research field which proposes the integration of robotic manipulators in aerial platforms, typically multirotors – widely known as “drones” – or autonomous helicopters. The development of this technology is motivated by the convenience to reduce the time, cost and risk associated to the execution of certain operations or tasks in high altitude areas or difficult access workspaces. Some illustrative application examples are the detection and insulation of leaks in pipe structures in chemical plants, repairing the corrosion in the blades of wind turbines, the maintenance of power lines, or the installation and retrieval of sensor devices in polluted areas. Although nowadays it is possible to find a wide variety of commercial multirotor platforms with payloads from a few gramps up to several kilograms, and flight times around thirty minutes, the development of an aerial manipulator is still a technological challenge due to the strong requirements relative to the design of the manipulator in terms of very low weight, low inertia, dexterity, mechanical robustness and control. The main contribution of this thesis is the design, development and experimental validation of several prototypes of lightweight (<2 kg) and compliant manipulators to be integrated in multirotor platforms, including human-size dual arm systems, compliant joint arms equipped with human-like finger modules for grasping, and long reach aerial manipulators. Since it is expected that the aerial manipulator is capable to execute inspection and maintenance tasks in a similar way a human operator would do, this thesis proposes a bioinspired design approach, trying to replicate the human arm in terms of size, kinematics, mass distribution, and compliance. This last feature is actually one of the key concepts developed and exploited in this work. Introducing a flexible element such as springs or elastomers between the servos and the links extends the capabilities of the manipulator, allowing the estimation and control of the torque/force, the detection of impacts and overloads, or the localization of obstacles by contact. It also improves safety and efficiency of the manipulator, especially during the operation on flight or in grabbing situations, where the impacts and contact forces may damage the manipulator or destabilize the aerial platform. Unlike most industrial manipulators, where force-torque control is possible at control rates above 1 kHz, the servo actuators typically employed in the development of aerial manipulators present important technological limitations: no torque feedback nor control, only position (and in some models, speed) references, low update rates (<100 Hz), and communication delays. However, these devices are still the best solution due to their high torque to weight ratio, low cost, compact design, and easy assembly and integration. In order to cope with these limitations, the compliant joint arms presented here estimate and control the wrenches from the deflection of the spring-lever transmission mechanism introduced in the joints, measured at joint level with encoders or potentiometers, or in the Cartesian space employing vision sensors. Note that in the developed prototypes, the maximum joint deflection is around 25 degrees, which corresponds to a deviation in the position of the end effector around 20 cm for a human-size arm. The capabilities and functionalities of the manipulators have been evaluated in fixed base test-bench firstly, and then in outdoor flight tests, integrating the arms in different commercial hexarotor platforms. Frequency characterization, position/force/impedance control, bimanual grasping, arm teleoperation, payload mass estimation, or contact-based obstacle localization are some of the experiments presented in this thesis that validate the developed prototypes.La manipulación aérea es un campo de investigación que propone la integración de manipuladores robóticos in plataformas aéreas, típicamente multirotores – comúnmente conocidos como “drones” – o helicópteros autónomos. El desarrollo de esta tecnología está motivada por la conveniencia de reducir el tiempo, coste y riesgo asociado a la ejecución de ciertas operaciones o tareas en áreas de gran altura o espacios de trabajo de difícil acceso. Algunos ejemplos ilustrativos de aplicaciones son la detección y aislamiento de fugas en estructura de tuberías en plantas químicas, la reparación de la corrosión en las palas de aerogeneradores, el mantenimiento de líneas eléctricas, o la instalación y recuperación de sensores en zonas contaminadas. Aunque hoy en día es posible encontrar una amplia variedad de plataformas multirotor comerciales con cargas de pago desde unos pocos gramos hasta varios kilogramos, y tiempo de vuelo entorno a treinta minutos, el desarrollo de los manipuladores aéreos es todavía un desafío tecnológico debido a los exigentes requisitos relativos al diseño del manipulador en términos de muy bajo peso, baja inercia, destreza, robustez mecánica y control. La contribución principal de esta tesis es el diseño, desarrollo y validación experimental de varios prototipos de manipuladores de bajo peso (<2 kg) con capacidad de acomodación (“compliant”) para su integración en plataformas aéreas multirotor, incluyendo sistemas bi-brazo de tamaño humano, brazos robóticos de articulaciones flexibles con dedos antropomórficos para agarre, y manipuladores aéreos de largo alcance. Puesto que se prevé que el manipulador aéreo sea capaz de ejecutar tareas de inspección y mantenimiento de forma similar a como lo haría un operador humano, esta tesis propone un enfoque de diseño bio-inspirado, tratando de replicar el brazo humano en cuanto a tamaño, cinemática, distribución de masas y flexibilidad. Esta característica es de hecho uno de los conceptos clave desarrollados y utilizados en este trabajo. Al introducir un elemento elástico como los muelles o elastómeros entre el los actuadores y los enlaces se aumenta las capacidades del manipulador, permitiendo la estimación y control de las fuerzas y pares, la detección de impactos y sobrecargas, o la localización de obstáculos por contacto. Además mejora la seguridad y eficiencia del manipulador, especialmente durante las operaciones en vuelo, donde los impactos y fuerzas de contacto pueden dañar el manipulador o desestabilizar la plataforma aérea. A diferencia de la mayoría de manipuladores industriales, donde el control de fuerzas y pares es posible a tasas por encima de 1 kHz, los servo motores típicamente utilizados en el desarrollo de manipuladores aéreos presentan importantes limitaciones tecnológicas: no hay realimentación ni control de torque, sólo admiten referencias de posición (o bien de velocidad), y presentan retrasos de comunicación. Sin embargo, estos dispositivos son todavía la mejor solución debido al alto ratio de torque a peso, por su bajo peso, diseño compacto y facilidad de ensamblado e integración. Para suplir estas limitaciones, los brazos robóticos flexibles presentados aquí permiten estimar y controlar las fuerzas a partir de la deflexión del mecanismo de muelle-palanca introducido en las articulaciones, medida a nivel articular mediante potenciómetros o codificadores, o en espacio Cartesiano mediante sensores de visión. Tómese como referencia que en los prototipos desarrollados la máxima deflexión articular es de unos 25 grados, lo que corresponde a una desviación de posición en torno a 20 cm en el efector final para un brazo de tamaño humano. Las capacidades y funcionalidades de estos manipuladores se han evaluado en base fija primero, y luego en vuelos en exteriores, integrando los brazos en diferentes plataformas hexartor comerciales. Caracterización frecuencial, control de posición/fuerza/impedancia, agarre bimanual, teleoperación de brazos, estimación de carga, o la localización de obstáculos mediante contacto son algunos de los experimentos presentados en esta tesis para validar los prototipos desarrollados por el auto

    Coastal Eye: Monitoring Coastal Environments Using Lightweight Drones

    Get PDF
    Monitoring coastal environments is a challenging task. This is because of both the logistical demands involved with in-situ data collection and the dynamic nature of the coastal zone, where multiple processes operate over varying spatial and temporal scales. Remote sensing products derived from spaceborne and airborne platforms have proven highly useful in the monitoring of coastal ecosystems, but often they fail to capture fine scale processes and there remains a lack of cost-effective and flexible methods for coastal monitoring at these scales. Proximal sensing technology such as lightweight drones and kites has greatly improved the ability to capture fine spatial resolution data at user-dictated visit times. These approaches are democratising, allowing researchers and managers to collect data in locations and at defined times themselves. In this thesis I develop our scientific understanding of the application of proximal sensing within coastal environments. The two critical review pieces consolidate disparate information on the application of kites as a proximal sensing platform, and the often overlooked hurdles of conducting drone operations in challenging environments. The empirical work presented then tests the use of this technology in three different coastal environments spanning the land-sea interface. Firstly, I use kite aerial photography and uncertainty-assessed structure-from-motion multi-view stereo (SfM-MVS) processing to track changes in coastal dunes over time. I report that sub-decimetre changes (both erosion and accretion) can be detected with this methodology. Secondly, I used lightweight drones to capture fine spatial resolution optical data of intertidal seagrass meadows. I found that estimations of plant cover were more similar to in-situ measures in sparsely populated than densely populated meadows. Lastly, I developed a novel technique utilising lightweight drones and SfM-MVS to measure benthic structural complexity in tropical coral reefs. I found that structural complexity measures were obtainable from SfM-MVS derived point clouds, but that the technique was influenced by glint type artefacts in the image data. Collectively, this work advances the knowledge of proximal sensing in the coastal zone, identifying both the strengths and weaknesses of its application across several ecosystems.Natural Environment Research Council (NERC

    Master of Science

    Get PDF
    thesisAutonomous and teleoperated flying robots capable of perch-and-stare are desirable for reconnaissance missions. Current solutions for perch-and-stare applications utilize various methods to enable aircraft to land on a limited set of surfaces that are typically horizontal or vertical planes. Motivated by the fact that songbirds are able to sleep in trees, without requiring active muscle control to stay perched, the research presented here details a concept that allows for passive perching of rotorcraft on a variety of surfaces. This thesis presents two prototype iterations, where perching is accomplished through the integration of two components: a compliant, underactuated gripping foot and a collapsing leg mechanism that converts aircraft weight into tendon tension in order to passively actuate the foot. This thesis presents the design process and analysis of the mechanisms. Additionally, stability tests were performed on the second prototype, attached to a quadrotor, that detail the versatility of the system and ability of the system to support external moments. The results show promise that it is possible to passively perch a rotorcraft on multiple surfaces and support reasonable environmental disturbances

    Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

    Get PDF
    Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more information, see https://vggoecks.com

    Spatial combination of sensor data deriving from mobile platforms for precision farming applications

    Get PDF
    This thesis combines optical sensors on a ground and on an aerial platform for field measurements in wheat, to identify nitrogen (N) levels, estimating biomass (BM) and predicting yield. The Multiplex Research (MP) fluorescence sensor was used for the first time in wheat. The individual objectives were: (i) Evaluation of different available sensors and sensor platforms used in Precision Farming (PF) to quantify the crop nutrition status, (ii) Acquisition of ground and aerial sensor data with two ground spectrometers, an aerial spectrometer and a ground fluorescence sensor, (iii) Development of effective post-processing methods for correction of the sensor data, (iv) Analysis and evaluation of the sensors with regard to the mapping of biomass, yield and nitrogen content in the plant, and (v) Yield simulation as a function of different sensor signals. This thesis contains three papers, published in international peer-reviewed journals. The first publication is a literature review on sensor platforms used in agricultural research. A subdivision of sensors and their applications was done, based on a detailed categorization model. It evaluates strengths and weaknesses, and discusses research results gathered with aerial and ground platforms with different sensors. Also, autonomous robots and swarm technologies suitable for PF tasks were reviewed. The second publication focuses on spectral and fluorescence sensors for BM, yield and N detection. The ground sensors were mounted on the Hohenheim research sensor platform Sensicle. A further spectrometer was installed in a fixed-wing Unmanned Aerial Vehicle (UAV). In this study, the sensors of the Sensicle and the UAV were used to determine plant characteristics and yield of three-year field trials at the research station Ihinger Hof, Renningen (Germany), an institution of the University of Hohenheim, Stuttgart (Germany). Winter wheat (Triticum aestivum L.) was sown on three research fields, with different N levels applied to each field. The measurements in the field were geo-referenced and logged with an absolute GPS accuracy of ±2.5 cm. The GPS data of the UAV was corrected based on the pitch and roll position of the UAV at each measurement. In the first step of the data analysis, raw data obtained from the sensors was post-processed and was converted into indices and ratios relating to plant characteristics. The converted ground sensor data were analysed, and the results of the correlations were interpreted related to the dependent variables (DV) BM weight, wheat yield and available N. The results showed significant positive correlations between the DVs and the Sensicle sensor data. For the third paper, the UAV sensor data was included into the evaluations. The UAV data analysis revealed low significant results for only one field in the year 2011. A multirotor UAV was considered as a more viable aerial platform, that allows for more precision and higher payload. Thereby, the ground sensors showed their strength at a close measuring distance to the plant and a smaller measurement footprint. The results of the two ground spectrometers showed significant positive correlations between yield and the indices from CropSpec, NDVI (Normalised Difference Vegetation Index) and REIP (Red-Edge Inflection Point). Also, FERARI and SFR (Simple Fluorescence Ratio) of the MP fluorescence sensor were chosen for the yield prediction model analysis. With the available N, CropSpec and REIP correlated significantly. The BM weight correlated with REIP even at a very early growing stage (Z 31), and with SAVI (Soil-Adjusted Vegetation Index) at ripening stage (Z 85). REIP, FERARI and SFR showed high correlations to the available N, especially in June and July. The ratios and signals of the MP sensor were highly significant compared to the BM weight above Z 85. Both ground spectrometers are suitable for data comparison and data combination with the active MP fluorescence sensor. Through a combination of fluorescence ratios and spectrometer indices, linear models for the prediction of wheat yield were generated, correlating significantly over the course of the vegetative period for research field Lammwirt (LW) in 2012. The best model for field LW in 2012 was selected for cross-validation with the measurements of the fields Inneres Täle (IT) and Riech (RI) in 2011 and 2012. However, it was not significant. By exchanging only one spectral index with a fluorescence ratio in a similar linear model, it showed significant correlations. This work successfully proves the combination of different sensor ratios and indices for the detection of plant characteristics, offering better and more robust predictions and quantifications of field parameters without employing destructive methods. The MP sensor proved to be universally applicable, showing significant correlations to the investigated characteristics such as BM weight, wheat yield and available N.Diese Arbeit kombiniert optische Sensoren auf einer Sensorplattform (SPF) am Boden und in der Luft bei Messungen in Weizen, um die Stickstoff-(N)-Werte zu identifizieren, während gleichzeitig die Biomasse (BM) geschätzt und der Ertrag vorhergesagt wird. Erstmals wurde hierfür der Fluoreszenzsensor Multiplex Research (MP) in Weizen eingesetzt. Die Ziele dieser Dissertation umfassen: (i) Bewertung verfügbarer Sensoren und SPF, die in der Präzisionslandwirtschaft zur Quantifizierung des Ernährungszustandes von Nutzpflanzen verwendet werden, (ii) Erfassung von Daten mit zwei Spektrometern am Boden, einem Spektrometer auf einem Modellflugzeug (UAV) und einem Fluoreszenzsensor am Boden, (iii) Erstellung effektiver Nachbearbeitungsmethoden für die Datenkorrektur, (iv) Analyse und Evaluation der Sensoren für die Abbildung der BM, des Ertrags und des N-Gehaltes in der Pflanze, und (v) Ertragssimulation als Funktion von Merkmalen unterschiedlicher Sensorsignale. Diese Arbeit enthält drei Artikel, die in international begutachteten Fachzeitschriften publiziert wurden. Die erste Veröffentlichung ist eine Literaturrecherche über SPF in der Agrarforschung. Ein detailliertes Kategorisierungsmodell wird für eine allgemeine Unterteilung der Sensoren und deren Anwendungsgebiete herangenommen, die Stärken und Schwächen bewertet, und die Forschungsergebnisse von Luft- und Bodenplattformen mit unterschiedlicher Sensorik diskutiert. Außerdem werden autonome Roboter und für landwirtschaftliche Aufgaben geeignete Schwarmtechnologien beschrieben. Die zweite Publikation fokussiert sich auf Spektral- und Fluoreszenzsensoren für die Erfassung von BM, Ertrag und N. In der Arbeit wurden die Bodensensoren auf der Hohenheimer Forschungs-SPF Sensicle und der Sensor auf dem UAV in dreijährigen Feldversuchen auf der Versuchsstation Ihinger Hof der Universität Hohenheim in Renningen für die Bestimmung von Pflanzenmerkmalen und des Ertrags eingesetzt. Auf drei Versuchsfeldern wurde Winterweizen ausgesät, und in einem randomisierten Versuchsdesign unterschiedliche N-Düngestufen angelegt. Die Sensormessungen im Feld wurden mit einer absoluten GPS Genauigkeit von ±2,5 cm verortet. Die GPS Daten des UAVs wurden mittels der Nick- und Rollposition lagekorrigiert. Im ersten Schritt der Datenanalyse wurden die Sensorrohdaten nachbearbeitet und in Indizes und Ratios umgerechnet. Die Bodensensordaten wurden analysiert, und die Ergebnisse der Korrelationen in Bezug zu den abhängigen Variablen (DV) BM-Gewicht, Weizenertrag, verfügbarer sowie aufgenommener N dargestellt. Die Ergebnisse zeigen signifikant positive Korrelationen zwischen den DVs und den Sensicle-Sensordaten. Für die dritte Publikation wurden die Sensordaten des UAV in die Auswertungen miteinbezogen. Die Analyse der UAV Daten zeigte niedrige signifikante Ergebnisse für nur ein Feld im Versuchsjahr 2011. Ein Multikopter wird als zuverlässigere Luftplattform erachtet, der mehr Präzision und eine höhere Nutzlast ermöglicht. Die Sensoren auf dem Sensicle zeigten ihren Vorteil bedingt durch einen kürzeren Messabstand zur Pflanze und eine kleinere Messfläche. Die Ergebnisse der beiden Sensicle-Spektrometer zeigten signifikant positive Korrelationen zwischen dem Ertrag und den Indizes von CropSpec, NDVI (Normalised Difference Vegetation Index) und REIP (Red-Edge Inflection Point). Auch FERARI und SFR (Simple Fluorescence Ratio) des MP-Sensors wurden für die Analyse des Ertragsvorhersagemodells ausgewählt. Mit dem verfügbaren N korrelierten CropSpec und REIP hochsignifikant. Das BM-Gewicht korrelierte bereits ab einem sehr frühen Wachstumsstadium (Z31) mit REIP und im Reifestadium (Z85) mit SAVI (Soil-Adjusted Vegetation Index). REIP, FERARI und SFR zeigten hohe Korrelationen mit dem verfügbaren N, insbesondere im Juni und Juli. Die Ratios und Signale des MP Sensors sind vor allem ab Z85 gegenüber dem BM-Gewicht hochsignifikant. Durch eine Kombination von Fluoreszenzwerten und Spektrometerindizes wurden lineare Modelle zur Vorhersage des Weizenertrags erstellt, die im Verlauf der Vegetationsperiode für das Versuchsfeld Lammwirt (LW) im Jahr 2012 signifikant korrelierten. Das beste Modell für das Feld LW im Jahr 2012 wurde für die Kreuzvalidierung mit den Messungen der Versuchsfelder Inneres Täle (IT) und Riech (RI) in den Jahren 2011 und 2012 ausgewählt. Sie waren nicht signifikant, jedoch zeigten sich durch den Austausch nur eines Spektralindexes mit einem Fluoreszenzratio in einem ähnlichen linearen Modell signifikante Korrelationen. Die vorliegende Arbeit zeigt erfolgreich, dass sich die Kombination verschiedener Sensorwerte und Sensorindizes zur Erkennung von Pflanzenmerkmalen gut eignet, und ohne den Einsatz destruktiver Methoden die Möglichkeit für bessere und robustere Vorhersagen bietet. Vor allem der MP-Fluoreszenzsensor erwies sich als universell einsetzbarer Sensor, der signifikante Korrelationen zu den untersuchten Merkmalen BM-Gewicht, Weizenertrag und verfügbarem N aufzeigte

    Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots

    Full text link
    We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies' performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to 40%40\% longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73\% to 0.5\%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute's choice affects the aerial robot's performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, \airl enables a broad class of deep RL research on UAVs. The source code is available at:~\texttt{\url{http://bit.ly/2JNAVb6}}.Comment: To Appear in Springer Machine Learning Journal (Special Issue on Reinforcement Learning for Real Life

    Feature Papers of Drones - Volume I

    Get PDF
    [EN] The present book is divided into two volumes (Volume I: articles 1–23, and Volume II: articles 24–54) which compile the articles and communications submitted to the Topical Collection ”Feature Papers of Drones” during the years 2020 to 2022 describing novel or new cutting-edge designs, developments, and/or applications of unmanned vehicles (drones). Articles 1–8 are devoted to the developments of drone design, where new concepts and modeling strategies as well as effective designs that improve drone stability and autonomy are introduced. Articles 9–16 focus on the communication aspects of drones as effective strategies for smooth deployment and efficient functioning are required. Therefore, several developments that aim to optimize performance and security are presented. In this regard, one of the most directly related topics is drone swarms, not only in terms of communication but also human-swarm interaction and their applications for science missions, surveillance, and disaster rescue operations. To conclude with the volume I related to drone improvements, articles 17–23 discusses the advancements associated with autonomous navigation, obstacle avoidance, and enhanced flight plannin

    Autonomous Drone Landings on an Unmanned Marine Vehicle using Deep Reinforcement Learning

    Get PDF
    This thesis describes with the integration of an Unmanned Surface Vehicle (USV) and an Unmanned Aerial Vehicle (UAV, also commonly known as drone) in a single Multi-Agent System (MAS). In marine robotics, the advantage offered by a MAS consists of exploiting the key features of a single robot to compensate for the shortcomings in the other. In this way, a USV can serve as the landing platform to alleviate the need for a UAV to be airborne for long periods time, whilst the latter can increase the overall environmental awareness thanks to the possibility to cover large portions of the prevailing environment with a camera (or more than one) mounted on it. There are numerous potential applications in which this system can be used, such as deployment in search and rescue missions, water and coastal monitoring, and reconnaissance and force protection, to name but a few. The theory developed is of a general nature. The landing manoeuvre has been accomplished mainly identifying, through artificial vision techniques, a fiducial marker placed on a flat surface serving as a landing platform. The raison d'etre for the thesis was to propose a new solution for autonomous landing that relies solely on onboard sensors and with minimum or no communications between the vehicles. To this end, initial work solved the problem while using only data from the cameras mounted on the in-flight drone. In the situation in which the tracking of the marker is interrupted, the current position of the USV is estimated and integrated into the control commands. The limitations of classic control theory used in this approached suggested the need for a new solution that empowered the flexibility of intelligent methods, such as fuzzy logic or artificial neural networks. The recent achievements obtained by deep reinforcement learning (DRL) techniques in end-to-end control in playing the Atari video-games suite represented a fascinating while challenging new way to see and address the landing problem. Therefore, novel architectures were designed for approximating the action-value function of a Q-learning algorithm and used to map raw input observation to high-level navigation actions. In this way, the UAV learnt how to land from high latitude without any human supervision, using only low-resolution grey-scale images and with a level of accuracy and robustness. Both the approaches have been implemented on a simulated test-bed based on Gazebo simulator and the model of the Parrot AR-Drone. The solution based on DRL was further verified experimentally using the Parrot Bebop 2 in a series of trials. The outcomes demonstrate that both these innovative methods are both feasible and practicable, not only in an outdoor marine scenario but also in indoor ones as well
    corecore