21 research outputs found

    Recent Advances in Image Restoration with Applications to Real World Problems

    Get PDF
    In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included

    ABC: Adaptive, Biomimetic, Configurable Robots for Smart Farms - From Cereal Phenotyping to Soft Fruit Harvesting

    Get PDF
    Currently, numerous factors, such as demographics, migration patterns, and economics, are leading to the critical labour shortage in low-skilled and physically demanding parts of agriculture. Thus, robotics can be developed for the agricultural sector to address these shortages. This study aims to develop an adaptive, biomimetic, and configurable modular robotics architecture that can be applied to multiple tasks (e.g., phenotyping, cutting, and picking), various crop varieties (e.g., wheat, strawberry, and tomato) and growing conditions. These robotic solutions cover the entire perception–action–decision-making loop targeting the phenotyping of cereals and harvesting fruits in a natural environment. The primary contributions of this thesis are as follows. a) A high-throughput method for imaging field-grown wheat in three dimensions, along with an accompanying unsupervised measuring method for obtaining individual wheat spike data are presented. The unsupervised method analyses the 3D point cloud of each trial plot, containing hundreds of wheat spikes, and calculates the average size of the wheat spike and total spike volume per plot. Experimental results reveal that the proposed algorithm can effectively identify spikes from wheat crops and individual spikes. b) Unlike cereal, soft fruit is typically harvested by manual selection and picking. To enable robotic harvesting, the initial perception system uses conditional generative adversarial networks to identify ripe fruits using synthetic data. To determine whether the strawberry is surrounded by obstacles, a cluster complexity-based perception system is further developed to classify the harvesting complexity of ripe strawberries. c) Once the harvest-ready fruit is localised using point cloud data generated by a stereo camera, the platform’s action system can coordinate the arm to reach/cut the stem using the passive motion paradigm framework, as inspired by studies on neural control of movement in the brain. Results from field trials for strawberry detection, reaching/cutting the stem of the fruit with a mean error of less than 3 mm, and extension to analysing complex canopy structures/bimanual coordination (searching/picking) are presented. Although this thesis focuses on strawberry harvesting, ongoing research is heading toward adapting the architecture to other crops. The agricultural food industry remains a labour-intensive sector with a low margin, and cost- and time-efficiency business model. The concepts presented herein can serve as a reference for future agricultural robots that are adaptive, biomimetic, and configurable

    Photo-realistic face synthesis and reenactment with deep generative models

    Get PDF
    The advent of Deep Learning has led to numerous breakthroughs in the field of Computer Vision. Over the last decade, a significant amount of research has been undertaken towards designing neural networks for visual data analysis. At the same time, rapid advancements have been made towards the direction of deep generative modeling, especially after the introduction of Generative Adversarial Networks (GANs), which have shown particularly promising results when it comes to synthesising visual data. Since then, considerable attention has been devoted to the problem of photo-realistic human face animation due to its wide range of applications, including image and video editing, virtual assistance, social media, teleconferencing, and augmented reality. The objective of this thesis is to make progress towards generating photo-realistic videos of human faces. To that end, we propose novel generative algorithms that provide explicit control over the facial expression and head pose of synthesised subjects. Despite the major advances in face reenactment and motion transfer, current methods struggle to generate video portraits that are indistinguishable from real data. In this work, we aim to overcome the limitations of existing approaches, by combining concepts from deep generative networks and video-to-video translation with 3D face modelling, and more specifically by capitalising on prior knowledge of faces that is enclosed within statistical models such as 3D Morphable Models (3DMMs). In the first part of this thesis, we introduce a person-specific system that performs full head reenactment using ideas from video-to-video translation. Subsequently, we propose a novel approach to controllable video portrait synthesis, inspired from Implicit Neural Representations (INR). In the second part of the thesis, we focus on person-agnostic methods and present a GAN-based framework that performs video portrait reconstruction, full head reenactment, expression editing, novel pose synthesis and face frontalisation.Open Acces

    On-the-fly dense 3D surface reconstruction for geometry-aware augmented reality.

    Get PDF
    Augmented Reality (AR) is an emerging technology that makes seamless connections between virtual space and the real world by superimposing computer-generated information onto the real-world environment. AR can provide additional information in a more intuitive and natural way than any other information-delivery method that a human has ever in- vented. Camera tracking is the enabling technology for AR and has been well studied for the last few decades. Apart from the tracking problems, sensing and perception of the surrounding environment are also very important and challenging problems. Although there are existing hardware solutions such as Microsoft Kinect and HoloLens that can sense and build the environmental structure, they are either too bulky or too expensive for AR. In this thesis, the challenging real-time dense 3D surface reconstruction technologies are studied and reformulated for the reinvention of basic position-aware AR towards geometry-aware and the outlook of context- aware AR. We initially propose to reconstruct the dense environmental surface using the sparse point from Simultaneous Localisation and Map- ping (SLAM), but this approach is prone to fail in challenging Minimally Invasive Surgery (MIS) scenes such as the presence of deformation and surgical smoke. We subsequently adopt stereo vision with SLAM for more accurate and robust results. With the success of deep learning technology in recent years, we present learning based single image re- construction and achieve the state-of-the-art results. Moreover, we pro- posed context-aware AR, one step further from purely geometry-aware AR towards the high-level conceptual interaction modelling in complex AR environment for enhanced user experience. Finally, a learning-based smoke removal method is proposed to ensure an accurate and robust reconstruction under extreme conditions such as the presence of surgical smoke

    A Survey on Physical Adversarial Attack in Computer Vision

    Full text link
    Over the past decade, deep learning has revolutionized conventional tasks that rely on hand-craft feature extraction with its strong feature learning capability, leading to substantial enhancements in traditional tasks. However, deep neural networks (DNNs) have been demonstrated to be vulnerable to adversarial examples crafted by malicious tiny noise, which is imperceptible to human observers but can make DNNs output the wrong result. Existing adversarial attacks can be categorized into digital and physical adversarial attacks. The former is designed to pursue strong attack performance in lab environments while hardly remaining effective when applied to the physical world. In contrast, the latter focus on developing physical deployable attacks, thus exhibiting more robustness in complex physical environmental conditions. Recently, with the increasing deployment of the DNN-based system in the real world, strengthening the robustness of these systems is an emergency, while exploring physical adversarial attacks exhaustively is the precondition. To this end, this paper reviews the evolution of physical adversarial attacks against DNN-based computer vision tasks, expecting to provide beneficial information for developing stronger physical adversarial attacks. Specifically, we first proposed a taxonomy to categorize the current physical adversarial attacks and grouped them. Then, we discuss the existing physical attacks and focus on the technique for improving the robustness of physical attacks under complex physical environmental conditions. Finally, we discuss the issues of the current physical adversarial attacks to be solved and give promising directions

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    3D data fusion by depth refinement and pose recovery

    Get PDF
    Refining depth maps from different sources to obtain a refined depth map, and aligning the rigid point clouds from different views, are two core techniques. Existing depth fusion algorithms do not provide a general framework to obtain a highly accurate depth map. Furthermore, existing rigid point cloud registration algorithms do not always align noisy point clouds robustly and accurately, especially when there are many outliers and large occlusions. In this thesis, we present a general depth fusion framework based on supervised, semi-supervised, and unsupervised adversarial network approaches. We show that the refined depth maps are more accurate than the source depth maps by depth fusion. We develop a new rigid point cloud registration algorithm by aligning two uncertainty-based Gaussian mixture models, which represent the structures of the two point clouds. We show that we can register rigid point clouds more accurately over a larger range of perturbations. Subsequently, the new supervised depth fusion algorithm and new rigid point cloud registration algorithm are integrated into the ROS system of a real gardening robot (called TrimBot) for practical usage in real environments. All the proposed algorithms have been evaluated on multiple existing datasets to show their superiority compared to prior work in the field

    Generation of realistic human behaviour

    Get PDF
    As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces

    Proceedings of the 2021 DigitalFUTURES

    Get PDF
    This open access book is a compilation of selected papers from 2021 DigitalFUTURES—The 3rd International Conference on Computational Design and Robotic Fabrication (CDRF 2021). The work focuses on novel techniques for computational design and robotic fabrication. The contents make valuable contributions to academic researchers, designers, and engineers in the industry. As well, readers encounter new ideas about understanding material intelligence in architecture
    corecore