28 research outputs found

    Context-aware Facial Inpainting with GANs

    Get PDF
    Facial inpainting is a difficult problem due to the complex structural patterns of a face image. Using irregular hole masks to generate contextualised features in a face image is becoming increasingly important in image inpainting. Existing methods generate images using deep learning models, but aberrations persist. The reason for this is that key operations are required for feature information dissemination, such as feature extraction mechanisms, feature propagation, and feature regularizers, are frequently overlooked or ignored during the design stage. A comprehensive review is conducted to examine existing methods and identify the research gaps that serve as the foundation for this thesis. The aim of this thesis is to develop novel facial inpainting algorithms with the capability of extracting contextualised features. First, Symmetric Skip Connection Wasserstein GAN (SWGAN) is proposed to inpaint high-resolution face images that are perceptually consistent with the rest of the image. Second, a perceptual adversarial Network (RMNet) is proposed to include feature extraction and feature propagation mechanisms that target missing regions while preserving visible ones. Third, a foreground-guided facial inpainting method is proposed with occlusion reasoning capability, which guides the model toward learning contextualised feature extraction and propagation while maintaining fidelity. Fourth, V-LinkNet is pro-posed that takes into account of the critical operations for information dissemination. Additionally, a standard protocol is introduced to prevent potential biases in performance evaluation of facial inpainting algorithms. The experimental results show V-LinkNet achieved the best results with SSIM of 0.96 on the standard protocol. In conclusion, generating facial images with contextualised features is important to achieve realistic results in inpainted regions. Additionally, it is critical to consider the standard procedure while comparing different approaches. Finally, this thesis outlines the new insights and future directions of image inpainting

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Generic Object Detection and Segmentation for Real-World Environments

    Get PDF

    IST Austria Thesis

    Get PDF
    Modern computer vision systems heavily rely on statistical machine learning models, which typically require large amounts of labeled data to be learned reliably. Moreover, very recently computer vision research widely adopted techniques for representation learning, which further increase the demand for labeled data. However, for many important practical problems there is relatively small amount of labeled data available, so it is problematic to leverage full potential of the representation learning methods. One way to overcome this obstacle is to invest substantial resources into producing large labelled datasets. Unfortunately, this can be prohibitively expensive in practice. In this thesis we focus on the alternative way of tackling the aforementioned issue. We concentrate on methods, which make use of weakly-labeled or even unlabeled data. Specifically, the first half of the thesis is dedicated to the semantic image segmentation task. We develop a technique, which achieves competitive segmentation performance and only requires annotations in a form of global image-level labels instead of dense segmentation masks. Subsequently, we present a new methodology, which further improves segmentation performance by leveraging tiny additional feedback from a human annotator. By using our methods practitioners can greatly reduce the amount of data annotation effort, which is required to learn modern image segmentation models. In the second half of the thesis we focus on methods for learning from unlabeled visual data. We study a family of autoregressive models for modeling structure of natural images and discuss potential applications of these models. Moreover, we conduct in-depth study of one of these applications, where we develop the state-of-the-art model for the probabilistic image colorization task
    corecore