412 research outputs found
Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV
Unmanned Aerial Vehicles (UAVs), have intrigued different people from all
walks of life, because of their pervasive computing capabilities. UAV equipped
with vision techniques, could be leveraged to establish navigation autonomous
control for UAV itself. Also, object detection from UAV could be used to
broaden the utilization of drone to provide ubiquitous surveillance and
monitoring services towards military operation, urban administration and
agriculture management. As the data-driven technologies evolved, machine
learning algorithm, especially the deep learning approach has been intensively
utilized to solve different traditional computer vision research problems.
Modern Convolutional Neural Networks based object detectors could be divided
into two major categories: one-stage object detector and two-stage object
detector. In this study, we utilize some representative CNN based object
detectors to execute the computer vision task over Stanford Drone Dataset
(SDD). State-of-the-art performance has been achieved in utilizing focal loss
dense detector RetinaNet based approach for object detection from UAV in a fast
and accurate manner.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0111
Vision-based Learning for Drones: A Survey
Drones as advanced cyber-physical systems are undergoing a transformative
shift with the advent of vision-based learning, a field that is rapidly gaining
prominence due to its profound impact on drone autonomy and functionality.
Different from existing task-specific surveys, this review offers a
comprehensive overview of vision-based learning in drones, emphasizing its
pivotal role in enhancing their operational capabilities under various
scenarios. We start by elucidating the fundamental principles of vision-based
learning, highlighting how it significantly improves drones' visual perception
and decision-making processes. We then categorize vision-based control methods
into indirect, semi-direct, and end-to-end approaches from the
perception-control perspective. We further explore various applications of
vision-based drones with learning capabilities, ranging from single-agent
systems to more complex multi-agent and heterogeneous system scenarios, and
underscore the challenges and innovations characterizing each area. Finally, we
explore open questions and potential solutions, paving the way for ongoing
research and development in this dynamic and rapidly evolving field. With
growing large language models (LLMs) and embodied intelligence, vision-based
learning for drones provides a promising but challenging road towards
artificial general intelligence (AGI) in 3D physical world
Vision-Based navigation system for unmanned aerial vehicles
Mención Internacional en el tÃtulo de doctorThe main objective of this dissertation is to provide Unmanned Aerial Vehicles
(UAVs) with a robust navigation system; in order to allow the UAVs to perform
complex tasks autonomously and in real-time. The proposed algorithms deal with
solving the navigation problem for outdoor as well as indoor environments, mainly
based on visual information that is captured by monocular cameras. In addition,
this dissertation presents the advantages of using the visual sensors as the main
source of data, or complementing other sensors in providing useful information; in
order to improve the accuracy and the robustness of the sensing purposes.
The dissertation mainly covers several research topics based on computer vision
techniques: (I) Pose Estimation, to provide a solution for estimating the 6D pose of
the UAV. This algorithm is based on the combination of SIFT detector and FREAK
descriptor; which maintains the performance of the feature points matching and decreases
the computational time. Thereafter, the pose estimation problem is solved
based on the decomposition of the world-to-frame and frame-to-frame homographies.
(II) Obstacle Detection and Collision Avoidance, in which, the UAV is able to
sense and detect the frontal obstacles that are situated in its path. The detection
algorithm mimics the human behaviors for detecting the approaching obstacles; by
analyzing the size changes of the detected feature points, combined with the expansion
ratios of the convex hull constructed around the detected feature points
from consecutive frames. Then, by comparing the area ratio of the obstacle and the
position of the UAV, the method decides if the detected obstacle may cause a collision.
Finally, the algorithm extracts the collision-free zones around the obstacle,
and combining with the tracked waypoints, the UAV performs the avoidance maneuver.
(III) Navigation Guidance, which generates the waypoints to determine
the flight path based on environment and the situated obstacles. Then provide
a strategy to follow the path segments and in an efficient way and perform the
flight maneuver smoothly. (IV) Visual Servoing, to offer different control solutions (Fuzzy Logic Control (FLC) and PID), based on the obtained visual information; in
order to achieve the flight stability as well as to perform the correct maneuver; to
avoid the possible collisions and track the waypoints.
All the proposed algorithms have been verified with real flights in both indoor
and outdoor environments, taking into consideration the visual conditions; such as
illumination and textures. The obtained results have been validated against other
systems; such as VICON motion capture system, DGPS in the case of pose estimate
algorithm. In addition, the proposed algorithms have been compared with several
previous works in the state of the art, and are results proves the improvement in
the accuracy and the robustness of the proposed algorithms.
Finally, this dissertation concludes that the visual sensors have the advantages
of lightweight and low consumption and provide reliable information, which is
considered as a powerful tool in the navigation systems to increase the autonomy
of the UAVs for real-world applications.El objetivo principal de esta tesis es proporcionar Vehiculos Aereos no Tripulados
(UAVs) con un sistema de navegacion robusto, para permitir a los UAVs realizar
tareas complejas de forma autonoma y en tiempo real. Los algoritmos propuestos
tratan de resolver problemas de la navegacion tanto en ambientes interiores como
al aire libre basandose principalmente en la informacion visual captada por las camaras
monoculares. Ademas, esta tesis doctoral presenta la ventaja de usar sensores
visuales bien como fuente principal de datos o complementando a otros sensores
en el suministro de informacion util, con el fin de mejorar la precision y la
robustez de los procesos de deteccion.
La tesis cubre, principalmente, varios temas de investigacion basados en tecnicas
de vision por computador: (I) Estimacion de la Posicion y la Orientacion
(Pose), para proporcionar una solucion a la estimacion de la posicion y orientacion
en 6D del UAV. Este algoritmo se basa en la combinacion del detector SIFT y el
descriptor FREAK, que mantiene el desempeno del a funcion de puntos de coincidencia
y disminuye el tiempo computacional. De esta manera, se soluciona el
problema de la estimacion de la posicion basandose en la descomposicion de las
homografias mundo a imagen e imagen a imagen. (II) Deteccion obstaculos y elusion
colisiones, donde el UAV es capaz de percibir y detectar los obstaculos frontales
que se encuentran en su camino. El algoritmo de deteccion imita comportamientos
humanos para detectar los obstaculos que se acercan, mediante el analisis de la
magnitud del cambio de los puntos caracteristicos detectados de referencia, combinado
con los ratios de expansion de los contornos convexos construidos alrededor
de los puntos caracteristicos detectados en frames consecutivos. A continuacion,
comparando la proporcion del area del obstaculo y la posicion del UAV, el metodo
decide si el obstaculo detectado puede provocar una colision. Por ultimo, el algoritmo
extrae las zonas libres de colision alrededor del obstaculo y combinandolo
con los puntos de referencia, elUAV realiza la maniobra de evasion. (III) Guiado de navegacion, que genera los puntos de referencia para determinar la trayectoria de
vuelo basada en el entorno y en los obstaculos detectados que encuentra. Proporciona
una estrategia para seguir los segmentos del trazado de una manera eficiente
y realizar la maniobra de vuelo con suavidad. (IV) Guiado por Vision, para ofrecer
soluciones de control diferentes (Control de Logica Fuzzy (FLC) y PID), basados en
la informacion visual obtenida con el fin de lograr la estabilidad de vuelo, asi como
realizar la maniobra correcta para evitar posibles colisiones y seguir los puntos de
referencia.
Todos los algoritmos propuestos han sido verificados con vuelos reales en ambientes
exteriores e interiores, tomando en consideracion condiciones visuales como
la iluminacion y las texturas. Los resultados obtenidos han sido validados con otros
sistemas: como el sistema de captura de movimiento VICON y DGPS en el caso del
algoritmo de estimacion de la posicion y orientacion. Ademas, los algoritmos propuestos
han sido comparados con trabajos anteriores recogidos en el estado del arte
con resultados que demuestran una mejora de la precision y la robustez de los algoritmos
propuestos.
Esta tesis doctoral concluye que los sensores visuales tienen las ventajes de tener
un peso ligero y un bajo consumo y, proporcionar informacion fiable, lo cual lo
hace una poderosa herramienta en los sistemas de navegacion para aumentar la
autonomia de los UAVs en aplicaciones del mundo real.Programa Oficial de Doctorado en IngenierÃa Eléctrica, Electrónica y AutomáticaPresidente: Carlo Regazzoni.- Secretario: Fernando GarcÃa Fernández.- Vocal: Pascual Campoy Cerver
Visual Guidance for Unmanned Aerial Vehicles with Deep Learning
Unmanned Aerial Vehicles (UAVs) have been widely applied in the military and civilian domains. In recent years, the operation mode of UAVs is evolving from teleoperation to autonomous flight. In order to fulfill the goal of autonomous flight, a reliable guidance system is essential. Since the combination of Global Positioning System (GPS) and Inertial Navigation System (INS) systems cannot sustain autonomous flight in some situations where GPS can be degraded or unavailable, using computer vision as a primary method for UAV guidance has been widely explored. Moreover, GPS does not provide any information to the robot on the presence of obstacles.
Stereo cameras have complex architecture and need a minimum baseline to generate disparity map. By contrast, monocular cameras are simple and require less hardware resources. Benefiting from state-of-the-art Deep Learning (DL) techniques, especially Convolutional Neural Networks (CNNs), a monocular camera is sufficient to extrapolate mid-level visual representations such as depth maps and optical flow (OF) maps from the environment. Therefore, the objective of this thesis is to develop a real-time visual guidance method for UAVs in cluttered environments using a monocular camera and DL.
The three major tasks performed in this thesis are investigating the development of DL techniques and monocular depth estimation (MDE), developing real-time CNNs for MDE, and developing visual guidance methods on the basis of the developed MDE system. A comprehensive survey is conducted, which covers Structure from Motion (SfM)-based methods, traditional handcrafted feature-based methods, and state-of-the-art DL-based methods. More importantly, it also investigates the application of MDE in robotics. Based on the survey, two CNNs for MDE are developed. In addition to promising accuracy performance, these two CNNs run at high frame rates (126 fps and 90 fps respectively), on a single modest power Graphical Processing Unit (GPU).
As regards the third task, the visual guidance for UAVs is first developed on top of the designed MDE networks. To improve the robustness of UAV guidance, OF maps are integrated into the developed visual guidance method. A cross-attention module is applied to fuse the features learned from the depth maps and OF maps. The fused features are then passed through a deep reinforcement learning (DRL) network to generate the policy for guiding the flight of UAV. Additionally, a simulation framework is developed which integrates AirSim, Unreal Engine and PyTorch. The effectiveness of the developed visual guidance method is validated through extensive experiments in the simulation framework
3D Reconstruction of Civil Infrastructures from UAV Lidar point clouds
Na atualidade, infraestruturas para transporte, comunicação, energia, produção industrial e fim social apresentam-se como pilares da sociedade, sendo imprescindÃveis para o seu bom funcionamento. Aliada a esta grande importância dentro da sociedade, existe necessidade de garantir a segurança e durabilidade destes ativos. Assim, técnicas confiáveis devem ser utilizadas para avaliar o seu estado.
Com o avanço tecnológico e o desenvolvimento de novos métodos de aquisição de dados, algumas tarefas relacionadas com a construção civil, atualmente realizadas por seres humanos, como a inspeção e o controlo de qualidade, tornam-se ineficientes dado o seu perigo e custo. Neste contexto, a reconstrução 3D de infraestruturas surge como uma possÃvel solução, apresentando-se como um primeiro passo para a monitorização e acompanhamento de infraestruturas, bem como uma ferramenta para processos de inspeção semi ou completamente automatizados.
Para o desenvolvimento desta tese, recorreu-se a um sensor Lidar acoplado a um UAV. Com este equipamento, tornou-se possÃvel sobrevoar de forma autónoma infraestruturas reais, extraindo dados de todas as suas superfÃcies, independentemente das dificuldades que poderiam surgir para alcançar tais regiões a partir do solo. Os dados são extraÃdos na forma de nuvens de pontos com respetivas intensidades, filtrados, e utilizados em algoritmos de reconstrução e texturização, culminando numa representação virtual e tridimensional da infraestrutura alvo. Com estas representações torna-se possÃvel avaliar a evolução da infraestrutura aquando da sua construção ou reparação, bem como permite avaliar a evolução temporal de determinados defeitos presentes na construção, bastando, para isso, comparar modelos relativos ao mesmo cenário obtidos a partir de dados extraÃdos em diferentes ocasiões.
Esta abordagem permite que o processo de monitorização de infraestruturas possa ser realizado de forma mais eficiente, com menores custos e garantindo a segurança dos trabalhadores.Nowadays, infrastructures for transportation, communication, energy, industrial production and social purpose are presented as pillars of society, being essential for its proper functioning. Coupled with this great importance within society, there is a need to ensure the safety and durability of these assets. Thus, reliable techniques should be used to assess their condition.
With technological advances and the development of new methods of data acquisition, some tasks related to civil construction, currently performed by human beings, such as inspection and quality control, become inefficient due to their danger and cost. In this context, 3D reconstruction of infrastructures appears as a possible solution, presenting itself as a first step for the monitoring of infrastructures, as well as a tool for semi or completely automated inspection processes.
For the development of this thesis, a Lidar sensor coupled to a UAV was used. With this equipment, it became possible to autonomously fly over real infrastructures, extracting data from all its surfaces, regardless of the difficulties that could arise to reach such regions from the ground. The data is extracted in the form of point clouds with respective intensities, filtered, and used in reconstruction and texturing algorithms, culminating in a virtual and three-dimensional representation of the target infrastructure. With these representations, it is possible to evaluate the evolution of the infrastructure during its construction or repair, as well as to evaluate the temporal evolution of certain defects present in the construction, by comparing models for the same scenario obtained from data extracted on different occasions.
This approach allows the process of monitoring infrastructures to be carried out more efficiently, with lower costs and ensuring the safety of workers
A Survey on Aerial Swarm Robotics
The use of aerial swarms to solve real-world problems has been increasing steadily, accompanied by falling prices and improving performance of communication, sensing, and processing hardware. The commoditization of hardware has reduced unit costs, thereby lowering the barriers to entry to the field of aerial swarm robotics. A key enabling technology for swarms is the family of algorithms that allow the individual members of the swarm to communicate and allocate tasks amongst themselves, plan their trajectories, and coordinate their flight in such a way that the overall objectives of the swarm are achieved efficiently. These algorithms, often organized in a hierarchical fashion, endow the swarm with autonomy at every level, and the role of a human operator can be reduced, in principle, to interactions at a higher level without direct intervention. This technology depends on the clever and innovative application of theoretical tools from control and estimation. This paper reviews the state of the art of these theoretical tools, specifically focusing on how they have been developed for, and applied to, aerial swarms. Aerial swarms differ from swarms of ground-based vehicles in two respects: they operate in a three-dimensional space and the dynamics of individual vehicles adds an extra layer of complexity. We review dynamic modeling and conditions for stability and controllability that are essential in order to achieve cooperative flight and distributed sensing. The main sections of this paper focus on major results covering trajectory generation, task allocation, adversarial control, distributed sensing, monitoring, and mapping. Wherever possible, we indicate how the physics and subsystem technologies of aerial robots are brought to bear on these individual areas
DETERMINING WHERE INDIVIDUAL VEHICLES SHOULD NOT DRIVE IN SEMIARID TERRAIN IN VIRGINIA CITY, NV
This thesis explored elements involved in determining and mapping where a vehicle should not drive off-road in semiarid areas. Obstacles are anything which slows or obstructs progress (Meyer et al., 1977) or limits the space available for maneuvering (Spenko et al., 2006). This study identified the major factors relevant in determining which terrain features should be considered obstacles when off-road driving and thus should be avoided. These are elements relating to the vehicle itself and how it is driven as well as terrain factors of slope, vegetation, water, and soil. Identification of these in the terrain was done using inferential methods of Terrain Pattern Recognition (TPR), analyzing of remotely sensing data, and Digital Elevation Map (DEM) data analysis. Analysis was further refined using other reference information about the area. Other factors such as weather, driving angle, and environmental impact are discussed. This information was applied to a section of Virginia City, Nevada as a case-study. Analysis and mapping was done purposely without field work prior to mapping to determine what could be assessed using only remote means. Not all findings from the literature review could be implemented in this trafficability study. Some methods and trafficability knowledge could not be implemented and were omitted due to data being unavailable, un-acquirable, or being too coarsely mapped to be useful. Examples of these are Lidar mapping of the area, soil profiling of the terrain, and assessment of plant species present in the area for driven-over traction and tire punctures. The Virginia City section was analyzed and mapped utilizing hyperspectral remotely sensed image data, remote-sensor-derived DEM data was used in a Geographical Information Systems (GIS). Stereo-paired air photos of the study site were used in TPR. Other information on flora, historical weather, and a previous soil survey map were used in a Geographical Information System (GIS). Field validation was used to check findings.The case study's trafficability assessment demonstrated methodologies of terrain analysis which successfully classified many materials present and identified major areas where a vehicle should not drive. The methods used were: Manual TPR of the stereo-paired air photo using a stereo photo viewer to conduct drainage-tracing and slope analysis of the DEM was done using automated methods in ArcMap. The SpecTIR hyperspectral data was analyzed using the manual Environment for Visualizing Images (ENVI) software hourglass procedure. Visual analysis of the hyperspectral data and air photos along with known soil and vegetation characteristics were used to refine analyses. Processed data was georectified using SpecTIR Geographic Lookup Table (GLT) input geometry, and exported to and analyzed in ArcMap with the other data previously listed. Features were identified based on their spectral attributes, spatial properties, and through visual analysis. Inaccuracies in mapping were attributable largely to spatial resolution of Digital Elevation Maps (DEMs) which averaged out some non-drivable obstacles and parts of a drivable road, subjective human and computer decisions during the processing of the data, and grouping of spectral end-members during hyperspectral data analysis. Further refinements to the mapping process could have been made if fieldwork was done during the mapping process.Mapping and field validation found: several manmade and natural obstacles were visible from the ground, but these obstacles were too fine, thin, or small to be identified from the remote sensing data. . Examples are fences and some natural terrain surface roughness - where the terrain's surface deviated from being a smooth surface, exhibiting micro- variations in surface elevation and/or textures. Slope analysis using the 10-meter and 30-meter resolution DEMs did not accurately depict some manmade features [eg. some of the buildings, portions of roads, and fences], evident with a well-trafficked paved road showing in DEM analysis as having too steep a slope [beyond 15°] to be drivable. Some features had been spectrally grouped together during analysis, due to similar spectral properties. Spectral grouping is a process where the spectral class's pixel areas are reviewed and classes which have too few occurrences are averaged into similar classes or dropped entirely. This is done to reduce the number of spectrally unique material classes to those that are most relevant to the terrain mapped. These decisions are subjective and in one case two similar spectral material classes were combined. In later evaluation should have remained as two separate material classes. In field sample collection, some of the determined features; free-standing water and liquid tanks, were found to be inaccessible due to being on private land and/or fence secured. These had to be visually verified - photos were also taken. Further refinements to the mapping could have been made if fieldwork was done during the mapping process. Determining and mapping where a vehicle should not drive in semiarid areas is a complex task which involves many variables and reference data types. Processing, analyzing, and fusing these different references entails subjective manual and automated decisions which are subject to errors and/or inaccuracies at multiple levels that can individually or collectively skew results, causing terrain trafficability to be depicted incorrectly. That said, a usable reference map is creatable which can assist decision makers when determining their route(s)
Collision-Free Navigation of Small UAVs in Complex Urban Environment
Small unmanned aerial vehicles (UAVs) are expected to become highly innovative solutions for all kind of tasks such as transport, surveillance, inspection or guidance, and many commercial ideas already exist. Herein, small multi rotor UAVs are preferred since they are easy to construct and to fly, at least in wide open spaces. However, many UAV business cases are foreseen in complex urban environments which are very challenging from the perspective of UAV flight. Our work focuses on the autonomous flight and collision-free navigation in an urban environment, where GPS is still considered for localization but where variations in the accuracy or temporary unavailability of GPS position data is explicitly considered. Herein, urban environments are challenging because they require flight nearby large structures and also nearby moving obstacles such as humans and other moving objects, at low altitudes or in very narrow spaces and thus
also in areas where GPS (global positioning system) position data might temporarily be very inaccurate or even not available. Therefore we designed a custom stereo camera with adjustable base length for the perception of the possible potential obstacles in the unknown outdoor environment. In this context the optimal design and sensitivity parameters are investigated in outdoor experiments. Using the stereo images, graph based SLAM approach is used for online three dimensional mapping of the static and dynamic environment. For the memory efficiency incremental online loop closure detection using bag of words method is implemented here. By having the three dimensional map, the cost of the cell and its transition calculated in real time by the modified D* lite which
will search and generate three dimensional collision free path planning. Experiments of the 3D mapping and collision free path planning are conducted using small UAV in outdoor scenario. The combined experimental results of real time mapping and path planning demonstrated that the three dimensional collision free path planning is able to handle the real time computational constraints while maintaining safety distance
Perception-aware receding horizon trajectory planning for multicopters with visual-inertial odometry
Visual inertial odometry (VIO) is widely used for the state estimation of
multicopters, but it may function poorly in environments with few visual
features or in overly aggressive flights. In this work, we propose a
perception-aware collision avoidance trajectory planner for multicopters, that
may be used with any feature-based VIO algorithm. Our approach is able to fly
the vehicle to a goal position at fast speed, avoiding obstacles in an unknown
stationary environment while achieving good VIO state estimation accuracy. The
proposed planner samples a group of minimum jerk trajectories and finds
collision-free trajectories among them, which are then evaluated based on their
speed to the goal and perception quality. Both the motion blur of features and
their locations are considered for the perception quality. Our novel
consideration of the motion blur of features enables automatic adaptation of
the trajectory's aggressiveness under environments with different light levels.
The best trajectory from the evaluation is tracked by the vehicle and is
updated in a receding horizon manner when new images are received from the
camera. Only generic assumptions about the VIO are made, so that the planner
may be used with various existing systems. The proposed method can run in
real-time on a small embedded computer on board. We validated the effectiveness
of our proposed approach through experiments in both indoor and outdoor
environments. Compared to a perception-agnostic planner, the proposed planner
kept more features in the camera's view and made the flight less aggressive,
making the VIO more accurate. It also reduced VIO failures, which occurred for
the perception-agnostic planner but not for the proposed planner. The ability
of the proposed planner to fly through dense obstacles was also validated. The
experiment video can be found at https://youtu.be/qO3LZIrpwtQ.Comment: 12 page
- …