123 research outputs found

    Effective image enhancement and fast object detection for improved UAV applications

    Get PDF
    As an emerging field, unmanned aerial vehicles (UAVs) feature from interdisciplinary techniques in science, engineering and industrial sectors. The massive applications span from remote sensing, precision agriculture, marine inspection, coast guarding, environmental monitoring, natural resources monitoring, e.g. forest, land and river, and disaster assessment, to smart city, intelligent transportation and logistics and delivery. With the fast growing demands from a wide range of application sectors, there is always a bottleneck how to improve the efficiency and efficacy of UAV in operation. Often, smart decision making is needed from the captured footages in a real-time manner, yet this is severely affected by the poor image quality, ineffective object detection and recognition models, and lack of robust and light models for supporting the edge computing and real deployment. In this thesis, several innovative works have been focused and developed to tackle some of the above issues. First of all, considering the quality requirements of the UAV images, various approaches and models have been proposed, yet they focus on different aspects and produce inconsistent results. As such, the work in this thesis has been categorised into denoising and dehazing focused, followed by comprehensive evaluation in terms of both qualitative and quantitative assessment. These will provide valuable insights and useful guidance to help the end user and research community. For fast and effective object detection and recognition, deep learning based models, especially the YOLO series, are popularly used. However, taking the YOLOv7 as the baseline, the performance is very much affected by a few factors, such as the low quality of the UAV images and the high-level of demanding of resources, leading to unsatisfactory performance in accuracy and processing speed. As a result, three major improvements, namely transformer, CIoULoss and the GhostBottleneck module, are introduced in this work to improve feature extraction, decision making in detection and recognition, and running efficiency. Comprehensive experiments on both publicly available and self-collected datasets have validated the efficiency and efficacy of the proposed algorithm. In addition, to facilitate the real deployment such as edge computing scenarios, embedded implementation of the key algorithm modules is introduced. These include the creative implementation on the Xavier NX platform, in comparison to the standard workstation settings with the NVIDIA GPUs. As a result, it has demonstrated promising results with improved performance in reduced resources consumption of the CPU/GPU usage and enhanced frame rate of real-time processing to benefit the real-time deployment with the uncompromised edge computing. Through these innovative investigation and development, a better understanding has been established on key challenges associated with UAV and Simultaneous Localisation and Mapping (SLAM) based applications, and possible solutions are presented. Keywords: Unmanned aerial vehicles (UAV); Simultaneous Localisation and Mapping (SLAM); denoising; dehazing; object detection; object recognition; deep learning; YOLOv7; transformer; GhostBottleneck; scene matching; embedded implementation; Xavier NX; edge computing.As an emerging field, unmanned aerial vehicles (UAVs) feature from interdisciplinary techniques in science, engineering and industrial sectors. The massive applications span from remote sensing, precision agriculture, marine inspection, coast guarding, environmental monitoring, natural resources monitoring, e.g. forest, land and river, and disaster assessment, to smart city, intelligent transportation and logistics and delivery. With the fast growing demands from a wide range of application sectors, there is always a bottleneck how to improve the efficiency and efficacy of UAV in operation. Often, smart decision making is needed from the captured footages in a real-time manner, yet this is severely affected by the poor image quality, ineffective object detection and recognition models, and lack of robust and light models for supporting the edge computing and real deployment. In this thesis, several innovative works have been focused and developed to tackle some of the above issues. First of all, considering the quality requirements of the UAV images, various approaches and models have been proposed, yet they focus on different aspects and produce inconsistent results. As such, the work in this thesis has been categorised into denoising and dehazing focused, followed by comprehensive evaluation in terms of both qualitative and quantitative assessment. These will provide valuable insights and useful guidance to help the end user and research community. For fast and effective object detection and recognition, deep learning based models, especially the YOLO series, are popularly used. However, taking the YOLOv7 as the baseline, the performance is very much affected by a few factors, such as the low quality of the UAV images and the high-level of demanding of resources, leading to unsatisfactory performance in accuracy and processing speed. As a result, three major improvements, namely transformer, CIoULoss and the GhostBottleneck module, are introduced in this work to improve feature extraction, decision making in detection and recognition, and running efficiency. Comprehensive experiments on both publicly available and self-collected datasets have validated the efficiency and efficacy of the proposed algorithm. In addition, to facilitate the real deployment such as edge computing scenarios, embedded implementation of the key algorithm modules is introduced. These include the creative implementation on the Xavier NX platform, in comparison to the standard workstation settings with the NVIDIA GPUs. As a result, it has demonstrated promising results with improved performance in reduced resources consumption of the CPU/GPU usage and enhanced frame rate of real-time processing to benefit the real-time deployment with the uncompromised edge computing. Through these innovative investigation and development, a better understanding has been established on key challenges associated with UAV and Simultaneous Localisation and Mapping (SLAM) based applications, and possible solutions are presented. Keywords: Unmanned aerial vehicles (UAV); Simultaneous Localisation and Mapping (SLAM); denoising; dehazing; object detection; object recognition; deep learning; YOLOv7; transformer; GhostBottleneck; scene matching; embedded implementation; Xavier NX; edge computing

    DEEP LEARNING FOR IMAGE RESTORATION AND ROBOTIC VISION

    Get PDF
    Traditional model-based approach requires the formulation of mathematical model, and the model often has limited performance. The quality of an image may degrade due to a variety of reasons: It could be the context of scene is affected by weather conditions such as haze, rain, and snow; It\u27s also possible that there is some noise generated during image processing/transmission (e.g., artifacts generated during compression.). The goal of image restoration is to restore the image back to desirable quality both subjectively and objectively. Agricultural robotics is gaining interest these days since most agricultural works are lengthy and repetitive. Computer vision is crucial to robots especially the autonomous ones. However, it is challenging to have a precise mathematical model to describe the aforementioned problems. Compared with traditional approach, learning-based approach has an edge since it does not require any model to describe the problem. Moreover, learning-based approach now has the best-in-class performance on most of the vision problems such as image dehazing, super-resolution, and image recognition. In this dissertation, we address the problem of image restoration and robotic vision with deep learning. These two problems are highly related with each other from a unique network architecture perspective: It is essential to select appropriate networks when dealing with different problems. Specifically, we solve the problems of single image dehazing, High Efficiency Video Coding (HEVC) loop filtering and super-resolution, and computer vision for an autonomous robot. Our technical contributions are threefold: First, we propose to reformulate haze as a signal-dependent noise which allows us to uncover it by learning a structural residual. Based on our novel reformulation, we solve dehazing with recursive deep residual network and generative adversarial network which emphasizes on objective and perceptual quality, respectively. Second, we replace traditional filters in HEVC with a Convolutional Neural Network (CNN) filter. We show that our CNN filter could achieve 7% BD-rate saving when compared with traditional filters such as bilateral and deblocking filter. We also propose to incorporate a multi-scale CNN super-resolution module into HEVC. Such post-processing module could improve visual quality under extremely low bandwidth. Third, a transfer learning technique is implemented to support vision and autonomous decision making of a precision pollination robot. Good experimental results are reported with real-world data

    Intelligent Transportation Related Complex Systems and Sensors

    Get PDF
    Building around innovative services related to different modes of transport and traffic management, intelligent transport systems (ITS) are being widely adopted worldwide to improve the efficiency and safety of the transportation system. They enable users to be better informed and make safer, more coordinated, and smarter decisions on the use of transport networks. Current ITSs are complex systems, made up of several components/sub-systems characterized by time-dependent interactions among themselves. Some examples of these transportation-related complex systems include: road traffic sensors, autonomous/automated cars, smart cities, smart sensors, virtual sensors, traffic control systems, smart roads, logistics systems, smart mobility systems, and many others that are emerging from niche areas. The efficient operation of these complex systems requires: i) efficient solutions to the issues of sensors/actuators used to capture and control the physical parameters of these systems, as well as the quality of data collected from these systems; ii) tackling complexities using simulations and analytical modelling techniques; and iii) applying optimization techniques to improve the performance of these systems. It includes twenty-four papers, which cover scientific concepts, frameworks, architectures and various other ideas on analytics, trends and applications of transportation-related data

    PromptIR: Prompting for All-in-One Blind Image Restoration

    Full text link
    Image restoration involves recovering a high-quality clean image from its degraded version. Deep learning-based methods have significantly improved image restoration performance, however, they have limited generalization ability to different degradation types and levels. This restricts their real-world application since it requires training individual models for each specific degradation and knowing the input degradation type to apply the relevant model. We present a prompt-based learning approach, PromptIR, for All-In-One image restoration that can effectively restore images from various types and levels of degradation. In particular, our method uses prompts to encode degradation-specific information, which is then used to dynamically guide the restoration network. This allows our method to generalize to different degradation types and levels, while still achieving state-of-the-art results on image denoising, deraining, and dehazing. Overall, PromptIR offers a generic and efficient plugin module with few lightweight prompts that can be used to restore images of various types and levels of degradation with no prior information on the corruptions present in the image. Our code and pretrained models are available here: https://github.com/va1shn9v/PromptI

    Visibility in underwater robotics: Benchmarking and single image dehazing

    Get PDF
    Dealing with underwater visibility is one of the most important challenges in autonomous underwater robotics. The light transmission in the water medium degrades images making the interpretation of the scene difficult and consequently compromising the whole intervention. This thesis contributes by analysing the impact of the underwater image degradation in commonly used vision algorithms through benchmarking. An online framework for underwater research that makes possible to analyse results under different conditions is presented. Finally, motivated by the results of experimentation with the developed framework, a deep learning solution is proposed capable of dehazing a degraded image in real time restoring the original colors of the image.Una de las dificultades más grandes de la robótica autónoma submarina es lidiar con la falta de visibilidad en imágenes submarinas. La transmisión de la luz en el agua degrada las imágenes dificultando el reconocimiento de objetos y en consecuencia la intervención. Ésta tesis se centra en el análisis del impacto de la degradación de las imágenes submarinas en algoritmos de visión a través de benchmarking, desarrollando un entorno de trabajo en la nube que permite analizar los resultados bajo diferentes condiciones. Teniendo en cuenta los resultados obtenidos con este entorno, se proponen métodos basados en técnicas de aprendizaje profundo para mitigar el impacto de la degradación de las imágenes en tiempo real introduciendo un paso previo que permita recuperar los colores originales

    Underwater image and video dehazing with pure haze region segmentation

    Get PDF
    © 2017 The Authors Underwater scenes captured by cameras are plagued with poor contrast and a spectral distortion, which are the result of the scattering and absorptive properties of water. In this paper we present a novel dehazing method that improves visibility in images and videos by detecting and segmenting image regions that contain only water. The colour of these regions, which we refer to as pure haze regions, is similar to the haze that is removed during the dehazing process. Moreover, we propose a semantic white balancing approach for illuminant estimation that uses the dominant colour of the water to address the spectral distortion present in underwater scenes. To validate the results of our method and compare them to those obtained with state-of-the-art approaches, we perform extensive subjective evaluation tests using images captured in a variety of water types and underwater videos captured onboard an underwater vehicle

    Deep Bilateral Learning for Real-Time Image Enhancement

    Get PDF
    Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.Comment: 12 pages, 14 figures, Siggraph 201
    corecore