412 research outputs found

    Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV

    Full text link
    Unmanned Aerial Vehicles (UAVs), have intrigued different people from all walks of life, because of their pervasive computing capabilities. UAV equipped with vision techniques, could be leveraged to establish navigation autonomous control for UAV itself. Also, object detection from UAV could be used to broaden the utilization of drone to provide ubiquitous surveillance and monitoring services towards military operation, urban administration and agriculture management. As the data-driven technologies evolved, machine learning algorithm, especially the deep learning approach has been intensively utilized to solve different traditional computer vision research problems. Modern Convolutional Neural Networks based object detectors could be divided into two major categories: one-stage object detector and two-stage object detector. In this study, we utilize some representative CNN based object detectors to execute the computer vision task over Stanford Drone Dataset (SDD). State-of-the-art performance has been achieved in utilizing focal loss dense detector RetinaNet based approach for object detection from UAV in a fast and accurate manner.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0111

    Vision-based Learning for Drones: A Survey

    Full text link
    Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities under various scenarios. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world

    Vision-Based navigation system for unmanned aerial vehicles

    Get PDF
    Mención Internacional en el título de doctorThe main objective of this dissertation is to provide Unmanned Aerial Vehicles (UAVs) with a robust navigation system; in order to allow the UAVs to perform complex tasks autonomously and in real-time. The proposed algorithms deal with solving the navigation problem for outdoor as well as indoor environments, mainly based on visual information that is captured by monocular cameras. In addition, this dissertation presents the advantages of using the visual sensors as the main source of data, or complementing other sensors in providing useful information; in order to improve the accuracy and the robustness of the sensing purposes. The dissertation mainly covers several research topics based on computer vision techniques: (I) Pose Estimation, to provide a solution for estimating the 6D pose of the UAV. This algorithm is based on the combination of SIFT detector and FREAK descriptor; which maintains the performance of the feature points matching and decreases the computational time. Thereafter, the pose estimation problem is solved based on the decomposition of the world-to-frame and frame-to-frame homographies. (II) Obstacle Detection and Collision Avoidance, in which, the UAV is able to sense and detect the frontal obstacles that are situated in its path. The detection algorithm mimics the human behaviors for detecting the approaching obstacles; by analyzing the size changes of the detected feature points, combined with the expansion ratios of the convex hull constructed around the detected feature points from consecutive frames. Then, by comparing the area ratio of the obstacle and the position of the UAV, the method decides if the detected obstacle may cause a collision. Finally, the algorithm extracts the collision-free zones around the obstacle, and combining with the tracked waypoints, the UAV performs the avoidance maneuver. (III) Navigation Guidance, which generates the waypoints to determine the flight path based on environment and the situated obstacles. Then provide a strategy to follow the path segments and in an efficient way and perform the flight maneuver smoothly. (IV) Visual Servoing, to offer different control solutions (Fuzzy Logic Control (FLC) and PID), based on the obtained visual information; in order to achieve the flight stability as well as to perform the correct maneuver; to avoid the possible collisions and track the waypoints. All the proposed algorithms have been verified with real flights in both indoor and outdoor environments, taking into consideration the visual conditions; such as illumination and textures. The obtained results have been validated against other systems; such as VICON motion capture system, DGPS in the case of pose estimate algorithm. In addition, the proposed algorithms have been compared with several previous works in the state of the art, and are results proves the improvement in the accuracy and the robustness of the proposed algorithms. Finally, this dissertation concludes that the visual sensors have the advantages of lightweight and low consumption and provide reliable information, which is considered as a powerful tool in the navigation systems to increase the autonomy of the UAVs for real-world applications.El objetivo principal de esta tesis es proporcionar Vehiculos Aereos no Tripulados (UAVs) con un sistema de navegacion robusto, para permitir a los UAVs realizar tareas complejas de forma autonoma y en tiempo real. Los algoritmos propuestos tratan de resolver problemas de la navegacion tanto en ambientes interiores como al aire libre basandose principalmente en la informacion visual captada por las camaras monoculares. Ademas, esta tesis doctoral presenta la ventaja de usar sensores visuales bien como fuente principal de datos o complementando a otros sensores en el suministro de informacion util, con el fin de mejorar la precision y la robustez de los procesos de deteccion. La tesis cubre, principalmente, varios temas de investigacion basados en tecnicas de vision por computador: (I) Estimacion de la Posicion y la Orientacion (Pose), para proporcionar una solucion a la estimacion de la posicion y orientacion en 6D del UAV. Este algoritmo se basa en la combinacion del detector SIFT y el descriptor FREAK, que mantiene el desempeno del a funcion de puntos de coincidencia y disminuye el tiempo computacional. De esta manera, se soluciona el problema de la estimacion de la posicion basandose en la descomposicion de las homografias mundo a imagen e imagen a imagen. (II) Deteccion obstaculos y elusion colisiones, donde el UAV es capaz de percibir y detectar los obstaculos frontales que se encuentran en su camino. El algoritmo de deteccion imita comportamientos humanos para detectar los obstaculos que se acercan, mediante el analisis de la magnitud del cambio de los puntos caracteristicos detectados de referencia, combinado con los ratios de expansion de los contornos convexos construidos alrededor de los puntos caracteristicos detectados en frames consecutivos. A continuacion, comparando la proporcion del area del obstaculo y la posicion del UAV, el metodo decide si el obstaculo detectado puede provocar una colision. Por ultimo, el algoritmo extrae las zonas libres de colision alrededor del obstaculo y combinandolo con los puntos de referencia, elUAV realiza la maniobra de evasion. (III) Guiado de navegacion, que genera los puntos de referencia para determinar la trayectoria de vuelo basada en el entorno y en los obstaculos detectados que encuentra. Proporciona una estrategia para seguir los segmentos del trazado de una manera eficiente y realizar la maniobra de vuelo con suavidad. (IV) Guiado por Vision, para ofrecer soluciones de control diferentes (Control de Logica Fuzzy (FLC) y PID), basados en la informacion visual obtenida con el fin de lograr la estabilidad de vuelo, asi como realizar la maniobra correcta para evitar posibles colisiones y seguir los puntos de referencia. Todos los algoritmos propuestos han sido verificados con vuelos reales en ambientes exteriores e interiores, tomando en consideracion condiciones visuales como la iluminacion y las texturas. Los resultados obtenidos han sido validados con otros sistemas: como el sistema de captura de movimiento VICON y DGPS en el caso del algoritmo de estimacion de la posicion y orientacion. Ademas, los algoritmos propuestos han sido comparados con trabajos anteriores recogidos en el estado del arte con resultados que demuestran una mejora de la precision y la robustez de los algoritmos propuestos. Esta tesis doctoral concluye que los sensores visuales tienen las ventajes de tener un peso ligero y un bajo consumo y, proporcionar informacion fiable, lo cual lo hace una poderosa herramienta en los sistemas de navegacion para aumentar la autonomia de los UAVs en aplicaciones del mundo real.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Carlo Regazzoni.- Secretario: Fernando García Fernández.- Vocal: Pascual Campoy Cerver

    Visual Guidance for Unmanned Aerial Vehicles with Deep Learning

    Full text link
    Unmanned Aerial Vehicles (UAVs) have been widely applied in the military and civilian domains. In recent years, the operation mode of UAVs is evolving from teleoperation to autonomous flight. In order to fulfill the goal of autonomous flight, a reliable guidance system is essential. Since the combination of Global Positioning System (GPS) and Inertial Navigation System (INS) systems cannot sustain autonomous flight in some situations where GPS can be degraded or unavailable, using computer vision as a primary method for UAV guidance has been widely explored. Moreover, GPS does not provide any information to the robot on the presence of obstacles. Stereo cameras have complex architecture and need a minimum baseline to generate disparity map. By contrast, monocular cameras are simple and require less hardware resources. Benefiting from state-of-the-art Deep Learning (DL) techniques, especially Convolutional Neural Networks (CNNs), a monocular camera is sufficient to extrapolate mid-level visual representations such as depth maps and optical flow (OF) maps from the environment. Therefore, the objective of this thesis is to develop a real-time visual guidance method for UAVs in cluttered environments using a monocular camera and DL. The three major tasks performed in this thesis are investigating the development of DL techniques and monocular depth estimation (MDE), developing real-time CNNs for MDE, and developing visual guidance methods on the basis of the developed MDE system. A comprehensive survey is conducted, which covers Structure from Motion (SfM)-based methods, traditional handcrafted feature-based methods, and state-of-the-art DL-based methods. More importantly, it also investigates the application of MDE in robotics. Based on the survey, two CNNs for MDE are developed. In addition to promising accuracy performance, these two CNNs run at high frame rates (126 fps and 90 fps respectively), on a single modest power Graphical Processing Unit (GPU). As regards the third task, the visual guidance for UAVs is first developed on top of the designed MDE networks. To improve the robustness of UAV guidance, OF maps are integrated into the developed visual guidance method. A cross-attention module is applied to fuse the features learned from the depth maps and OF maps. The fused features are then passed through a deep reinforcement learning (DRL) network to generate the policy for guiding the flight of UAV. Additionally, a simulation framework is developed which integrates AirSim, Unreal Engine and PyTorch. The effectiveness of the developed visual guidance method is validated through extensive experiments in the simulation framework

    3D Reconstruction of Civil Infrastructures from UAV Lidar point clouds

    Get PDF
    Na atualidade, infraestruturas para transporte, comunicação, energia, produção industrial e fim social apresentam-se como pilares da sociedade, sendo imprescindíveis para o seu bom funcionamento. Aliada a esta grande importância dentro da sociedade, existe necessidade de garantir a segurança e durabilidade destes ativos. Assim, técnicas confiáveis devem ser utilizadas para avaliar o seu estado. Com o avanço tecnológico e o desenvolvimento de novos métodos de aquisição de dados, algumas tarefas relacionadas com a construção civil, atualmente realizadas por seres humanos, como a inspeção e o controlo de qualidade, tornam-se ineficientes dado o seu perigo e custo. Neste contexto, a reconstrução 3D de infraestruturas surge como uma possível solução, apresentando-se como um primeiro passo para a monitorização e acompanhamento de infraestruturas, bem como uma ferramenta para processos de inspeção semi ou completamente automatizados. Para o desenvolvimento desta tese, recorreu-se a um sensor Lidar acoplado a um UAV. Com este equipamento, tornou-se possível sobrevoar de forma autónoma infraestruturas reais, extraindo dados de todas as suas superfícies, independentemente das dificuldades que poderiam surgir para alcançar tais regiões a partir do solo. Os dados são extraídos na forma de nuvens de pontos com respetivas intensidades, filtrados, e utilizados em algoritmos de reconstrução e texturização, culminando numa representação virtual e tridimensional da infraestrutura alvo. Com estas representações torna-se possível avaliar a evolução da infraestrutura aquando da sua construção ou reparação, bem como permite avaliar a evolução temporal de determinados defeitos presentes na construção, bastando, para isso, comparar modelos relativos ao mesmo cenário obtidos a partir de dados extraídos em diferentes ocasiões. Esta abordagem permite que o processo de monitorização de infraestruturas possa ser realizado de forma mais eficiente, com menores custos e garantindo a segurança dos trabalhadores.Nowadays, infrastructures for transportation, communication, energy, industrial production and social purpose are presented as pillars of society, being essential for its proper functioning. Coupled with this great importance within society, there is a need to ensure the safety and durability of these assets. Thus, reliable techniques should be used to assess their condition. With technological advances and the development of new methods of data acquisition, some tasks related to civil construction, currently performed by human beings, such as inspection and quality control, become inefficient due to their danger and cost. In this context, 3D reconstruction of infrastructures appears as a possible solution, presenting itself as a first step for the monitoring of infrastructures, as well as a tool for semi or completely automated inspection processes. For the development of this thesis, a Lidar sensor coupled to a UAV was used. With this equipment, it became possible to autonomously fly over real infrastructures, extracting data from all its surfaces, regardless of the difficulties that could arise to reach such regions from the ground. The data is extracted in the form of point clouds with respective intensities, filtered, and used in reconstruction and texturing algorithms, culminating in a virtual and three-dimensional representation of the target infrastructure. With these representations, it is possible to evaluate the evolution of the infrastructure during its construction or repair, as well as to evaluate the temporal evolution of certain defects present in the construction, by comparing models for the same scenario obtained from data extracted on different occasions. This approach allows the process of monitoring infrastructures to be carried out more efficiently, with lower costs and ensuring the safety of workers

    A Survey on Aerial Swarm Robotics

    Get PDF
    The use of aerial swarms to solve real-world problems has been increasing steadily, accompanied by falling prices and improving performance of communication, sensing, and processing hardware. The commoditization of hardware has reduced unit costs, thereby lowering the barriers to entry to the field of aerial swarm robotics. A key enabling technology for swarms is the family of algorithms that allow the individual members of the swarm to communicate and allocate tasks amongst themselves, plan their trajectories, and coordinate their flight in such a way that the overall objectives of the swarm are achieved efficiently. These algorithms, often organized in a hierarchical fashion, endow the swarm with autonomy at every level, and the role of a human operator can be reduced, in principle, to interactions at a higher level without direct intervention. This technology depends on the clever and innovative application of theoretical tools from control and estimation. This paper reviews the state of the art of these theoretical tools, specifically focusing on how they have been developed for, and applied to, aerial swarms. Aerial swarms differ from swarms of ground-based vehicles in two respects: they operate in a three-dimensional space and the dynamics of individual vehicles adds an extra layer of complexity. We review dynamic modeling and conditions for stability and controllability that are essential in order to achieve cooperative flight and distributed sensing. The main sections of this paper focus on major results covering trajectory generation, task allocation, adversarial control, distributed sensing, monitoring, and mapping. Wherever possible, we indicate how the physics and subsystem technologies of aerial robots are brought to bear on these individual areas

    DETERMINING WHERE INDIVIDUAL VEHICLES SHOULD NOT DRIVE IN SEMIARID TERRAIN IN VIRGINIA CITY, NV

    Get PDF
    This thesis explored elements involved in determining and mapping where a vehicle should not drive off-road in semiarid areas. Obstacles are anything which slows or obstructs progress (Meyer et al., 1977) or limits the space available for maneuvering (Spenko et al., 2006). This study identified the major factors relevant in determining which terrain features should be considered obstacles when off-road driving and thus should be avoided. These are elements relating to the vehicle itself and how it is driven as well as terrain factors of slope, vegetation, water, and soil. Identification of these in the terrain was done using inferential methods of Terrain Pattern Recognition (TPR), analyzing of remotely sensing data, and Digital Elevation Map (DEM) data analysis. Analysis was further refined using other reference information about the area. Other factors such as weather, driving angle, and environmental impact are discussed. This information was applied to a section of Virginia City, Nevada as a case-study. Analysis and mapping was done purposely without field work prior to mapping to determine what could be assessed using only remote means. Not all findings from the literature review could be implemented in this trafficability study. Some methods and trafficability knowledge could not be implemented and were omitted due to data being unavailable, un-acquirable, or being too coarsely mapped to be useful. Examples of these are Lidar mapping of the area, soil profiling of the terrain, and assessment of plant species present in the area for driven-over traction and tire punctures. The Virginia City section was analyzed and mapped utilizing hyperspectral remotely sensed image data, remote-sensor-derived DEM data was used in a Geographical Information Systems (GIS). Stereo-paired air photos of the study site were used in TPR. Other information on flora, historical weather, and a previous soil survey map were used in a Geographical Information System (GIS). Field validation was used to check findings.The case study's trafficability assessment demonstrated methodologies of terrain analysis which successfully classified many materials present and identified major areas where a vehicle should not drive. The methods used were: Manual TPR of the stereo-paired air photo using a stereo photo viewer to conduct drainage-tracing and slope analysis of the DEM was done using automated methods in ArcMap. The SpecTIR hyperspectral data was analyzed using the manual Environment for Visualizing Images (ENVI) software hourglass procedure. Visual analysis of the hyperspectral data and air photos along with known soil and vegetation characteristics were used to refine analyses. Processed data was georectified using SpecTIR Geographic Lookup Table (GLT) input geometry, and exported to and analyzed in ArcMap with the other data previously listed. Features were identified based on their spectral attributes, spatial properties, and through visual analysis. Inaccuracies in mapping were attributable largely to spatial resolution of Digital Elevation Maps (DEMs) which averaged out some non-drivable obstacles and parts of a drivable road, subjective human and computer decisions during the processing of the data, and grouping of spectral end-members during hyperspectral data analysis. Further refinements to the mapping process could have been made if fieldwork was done during the mapping process.Mapping and field validation found: several manmade and natural obstacles were visible from the ground, but these obstacles were too fine, thin, or small to be identified from the remote sensing data. . Examples are fences and some natural terrain surface roughness - where the terrain's surface deviated from being a smooth surface, exhibiting micro- variations in surface elevation and/or textures. Slope analysis using the 10-meter and 30-meter resolution DEMs did not accurately depict some manmade features [eg. some of the buildings, portions of roads, and fences], evident with a well-trafficked paved road showing in DEM analysis as having too steep a slope [beyond 15°] to be drivable. Some features had been spectrally grouped together during analysis, due to similar spectral properties. Spectral grouping is a process where the spectral class's pixel areas are reviewed and classes which have too few occurrences are averaged into similar classes or dropped entirely. This is done to reduce the number of spectrally unique material classes to those that are most relevant to the terrain mapped. These decisions are subjective and in one case two similar spectral material classes were combined. In later evaluation should have remained as two separate material classes. In field sample collection, some of the determined features; free-standing water and liquid tanks, were found to be inaccessible due to being on private land and/or fence secured. These had to be visually verified - photos were also taken. Further refinements to the mapping could have been made if fieldwork was done during the mapping process. Determining and mapping where a vehicle should not drive in semiarid areas is a complex task which involves many variables and reference data types. Processing, analyzing, and fusing these different references entails subjective manual and automated decisions which are subject to errors and/or inaccuracies at multiple levels that can individually or collectively skew results, causing terrain trafficability to be depicted incorrectly. That said, a usable reference map is creatable which can assist decision makers when determining their route(s)

    Collision-Free Navigation of Small UAVs in Complex Urban Environment

    Get PDF
    Small unmanned aerial vehicles (UAVs) are expected to become highly innovative solutions for all kind of tasks such as transport, surveillance, inspection or guidance, and many commercial ideas already exist. Herein, small multi rotor UAVs are preferred since they are easy to construct and to fly, at least in wide open spaces. However, many UAV business cases are foreseen in complex urban environments which are very challenging from the perspective of UAV flight. Our work focuses on the autonomous flight and collision-free navigation in an urban environment, where GPS is still considered for localization but where variations in the accuracy or temporary unavailability of GPS position data is explicitly considered. Herein, urban environments are challenging because they require flight nearby large structures and also nearby moving obstacles such as humans and other moving objects, at low altitudes or in very narrow spaces and thus also in areas where GPS (global positioning system) position data might temporarily be very inaccurate or even not available. Therefore we designed a custom stereo camera with adjustable base length for the perception of the possible potential obstacles in the unknown outdoor environment. In this context the optimal design and sensitivity parameters are investigated in outdoor experiments. Using the stereo images, graph based SLAM approach is used for online three dimensional mapping of the static and dynamic environment. For the memory efficiency incremental online loop closure detection using bag of words method is implemented here. By having the three dimensional map, the cost of the cell and its transition calculated in real time by the modified D* lite which will search and generate three dimensional collision free path planning. Experiments of the 3D mapping and collision free path planning are conducted using small UAV in outdoor scenario. The combined experimental results of real time mapping and path planning demonstrated that the three dimensional collision free path planning is able to handle the real time computational constraints while maintaining safety distance

    Perception-aware receding horizon trajectory planning for multicopters with visual-inertial odometry

    Full text link
    Visual inertial odometry (VIO) is widely used for the state estimation of multicopters, but it may function poorly in environments with few visual features or in overly aggressive flights. In this work, we propose a perception-aware collision avoidance trajectory planner for multicopters, that may be used with any feature-based VIO algorithm. Our approach is able to fly the vehicle to a goal position at fast speed, avoiding obstacles in an unknown stationary environment while achieving good VIO state estimation accuracy. The proposed planner samples a group of minimum jerk trajectories and finds collision-free trajectories among them, which are then evaluated based on their speed to the goal and perception quality. Both the motion blur of features and their locations are considered for the perception quality. Our novel consideration of the motion blur of features enables automatic adaptation of the trajectory's aggressiveness under environments with different light levels. The best trajectory from the evaluation is tracked by the vehicle and is updated in a receding horizon manner when new images are received from the camera. Only generic assumptions about the VIO are made, so that the planner may be used with various existing systems. The proposed method can run in real-time on a small embedded computer on board. We validated the effectiveness of our proposed approach through experiments in both indoor and outdoor environments. Compared to a perception-agnostic planner, the proposed planner kept more features in the camera's view and made the flight less aggressive, making the VIO more accurate. It also reduced VIO failures, which occurred for the perception-agnostic planner but not for the proposed planner. The ability of the proposed planner to fly through dense obstacles was also validated. The experiment video can be found at https://youtu.be/qO3LZIrpwtQ.Comment: 12 page
    • …
    corecore