9 research outputs found

    Deep image prior inpainting of ancient frescoes in the Mediterranean Alpine arc

    Full text link
    The unprecedented success of image reconstruction approaches based on deep neural networks has revolutionised both the processing and the analysis paradigms in several applied disciplines. In the field of digital humanities, the task of digital reconstruction of ancient frescoes is particularly challenging due to the scarce amount of available training data caused by ageing, wear, tear and retouching over time. To overcome these difficulties, we consider the Deep Image Prior (DIP) inpainting approach which computes appropriate reconstructions by relying on the progressive updating of an untrained convolutional neural network so as to match the reliable piece of information in the image at hand while promoting regularisation elsewhere. In comparison with state-of-the-art approaches (based on variational/PDEs and patch-based methods), DIP-based inpainting reduces artefacts and better adapts to contextual/non-local information, thus providing a valuable and effective tool for art historians. As a case study, we apply such approach to reconstruct missing image contents in a dataset of highly damaged digital images of medieval paintings located into several chapels in the Mediterranean Alpine Arc and provide a detailed description on how visible and invisible (e.g., infrared) information can be integrated for identifying and reconstructing damaged image regions.Comment: 26 page

    Pareto Optimized Large Mask Approach for Efficient and Background Humanoid Shape Removal

    Get PDF
    The purpose of automated video object removal is to not only detect and remove the object of interest automatically, but also to utilize background context to inpaint the foreground area. Video inpainting requires to fill spatiotemporal gaps in a video with convincing material, necessitating both temporal and spatial consistency; the inpainted part must seamlessly integrate into the background in a variety of scenes, and it must maintain a consistent appearance in subsequent frames even if its surroundings change noticeably. We introduce deep learning-based methodology for removing unwanted human-like shapes in videos. The method uses Pareto-optimized Generative Adversarial Networks (GANs) technology, which is a novel contribution. The system automatically selects the Region of Interest (ROI) for each humanoid shape and uses a skeleton detection module to determine which humanoid shape to retain. The semantic masks of human like shapes are created using a semantic-aware occlusion-robust model that has four primary components: feature extraction, and local, global, and semantic branches. The global branch encodes occlusion-aware information to make the extracted features resistant to occlusion, while the local branch retrieves fine-grained local characteristics. A modified big mask inpainting approach is employed to eliminate a person from the image, leveraging Fast Fourier convolutions and utilizing polygonal chains and rectangles with unpredictable aspect ratios. The inpainter network takes the input image and the mask to create an output image excluding the background humanoid shapes. The generator uses an encoder-decoder structure with included skip connections to recover spatial information and dilated convolution and squeeze and excitation blocks to make the regions behind the humanoid shapes consistent with their surroundings. The discriminator avoids dissimilar structure at the patch scale, and the refiner network catches features around the boundaries of each background humanoid shape. The efficiency was assessed using the Structural Learned Perceptual Image Patch Similarity, Frechet Inception Distance, and Similarity Index Measure metrics and showed promising results in fully automated background person removal task. The method is evaluated on two video object segmentation datasets (DAVIS indicating respective values of 0.02, FID of 5.01 and SSIM of 0.79 and YouTube-VOS, resulting in 0.03, 6.22, 0.78 respectively) as well a database of 66 distinct video sequences of people behind a desk in an office environment (0.02, 4.01, and 0.78 respectively).publishedVersio

    Application of Multi-Sensor Fusion Technology in Target Detection and Recognition

    Get PDF
    Application of multi-sensor fusion technology has drawn a lot of industrial and academic interest in recent years. The multi-sensor fusion methods are widely used in many applications, such as autonomous systems, remote sensing, video surveillance, and the military. These methods can obtain the complementary properties of targets by considering multiple sensors. On the other hand, they can achieve a detailed environment description and accurate detection of interest targets based on the information from different sensors.This book collects novel developments in the field of multi-sensor, multi-source, and multi-process information fusion. Articles are expected to emphasize one or more of the three facets: architectures, algorithms, and applications. Published papers dealing with fundamental theoretical analyses, as well as those demonstrating their application to real-world problems

    Multisensory Imagery Cues for Object Separation, Specularity Detection and Deep Learning based Inpainting

    Full text link
    Multisensory imagery cues have been actively investigated in diverse applications in the computer vision community to provide additional geometric information that is either absent or difficult to capture from mainstream two-dimensional imaging. The inherent features of multispectral polarimetric light field imagery (MSPLFI) include object distribution over spectra, surface properties, shape, shading and pixel flow in light space. The aim of this dissertation is to explore these inherent properties to exploit new structures and methodologies for the tasks of object separation, specularity detection and deep learning-based inpainting in MSPLFI. In the first part of this research, an application to separate foreground objects from the background in both outdoor and indoor scenes using multispectral polarimetric imagery (MSPI) cues is examined. Based on the pixel neighbourhood relationship, an on-demand clustering technique is proposed and implemented to separate artificial objects from natural background in a complex outdoor scene. However, due to indoor scenes only containing artificial objects, with vast variations in energy levels among spectra, a multiband fusion technique followed by a background segmentation algorithm is proposed to separate the foreground from the background. In this regard, first, each spectrum is decomposed into low and high frequencies using the fast Fourier transform (FFT) method. Second, principal component analysis (PCA) is applied on both frequency images of the individual spectrum and then combined with the first principal components as a fused image. Finally, a polarimetric background segmentation (BS) algorithm based on the Stokes vector is proposed and implemented on the fused image. The performance of the proposed approaches are evaluated and compared using publicly available MSPI datasets and the dice similarity coefficient (DSC). The proposed multiband fusion and BS methods demonstrate better fusion quality and higher segmentation accuracy compared with other studies for several metrics, including mean absolute percentage error (MAPE), peak signal-to-noise ratio (PSNR), Pearson correlation coefficient (PCOR) mutual information (MI), accuracy, Geometric Mean (G-mean), precision, recall and F1-score. In the second part of this work, a twofold framework for specular reflection detection (SRD) and specular reflection inpainting (SRI) in transparent objects is proposed. The SRD algorithm is based on the mean, the covariance and the Mahalanobis distance for predicting anomalous pixels in MSPLFI. The SRI algorithm first selects four-connected neighbouring pixels from sub-aperture images and then replaces the SRD pixel with the closest matched pixel. For both algorithms, a 6D MSPLFI transparent object dataset is captured from multisensory imagery cues due to the unavailability of this kind of dataset. The experimental results demonstrate that the proposed algorithms predict higher SRD accuracy and better SRI quality than the existing approaches reported in this part in terms of F1-score, G-mean, accuracy, the structural similarity index (SSIM), the PSNR, the mean squared error (IMMSE) and the mean absolute deviation (MAD). However, due to synthesising SRD pixels based on the pixel neighbourhood relationship, the proposed inpainting method in this research produces artefacts and errors when inpainting large specularity areas with irregular holes. Therefore, in the last part of this research, the emphasis is on inpainting large specularity areas with irregular holes based on the deep feature extraction from multisensory imagery cues. The proposed six-stage deep learning inpainting (DLI) framework is based on the generative adversarial network (GAN) architecture and consists of a generator network and a discriminator network. First, pixels’ global flow in the sub-aperture images is calculated by applying the large displacement optical flow (LDOF) method. The proposed training algorithm combines global flow with local flow and coarse inpainting results predicted from the baseline method. The generator attempts to generate best-matched features, while the discriminator seeks to predict the maximum difference between the predicted results and the actual results. The experimental results demonstrate that in terms of the PSNR, MSSIM, IMMSE and MAD, the proposed DLI framework predicts superior inpainting quality to the baseline method and the previous part of this research

    Inpainting de modèles 3D pour la réalité diminuée : "couper/coller" réaliste pour l'aménagement d'intérieur

    Get PDF
    Par opposition à la Réalité Augmentée qui consiste à ajouter des éléments virtuels à un environnement réel, la Réalité Diminuée consiste à supprimer des éléments réels d'un environnement. Le but est d'effectuer un rendu visuel d'une scène 3D où les éléments "effacés" ne sont plus présents : la difficulté consiste à créer une image de sorte que la diminution ne soit pas perceptible par l'utilisateur. Il faut donc venir compléter la scène initialement cachée par ces éléments, en effectuant une opération d'inpainting qui prenne en compte la géométrie de la pièce, sa texture (structurée ou non), et la luminosité ambiante de l'environnement. Par exemple, l’œil humain est sensible à la régularité d'une texture. L'un des objectifs d'Innersense, entreprise spécialisée dans l'aménagement virtuel d’intérieurs, est de développer un produit capable d'enlever des éléments présents dans une pièce d'intérieur. Une fois la suppression virtuelle des meubles existants effectuée , il sera alors possible d'ajouter des meubles virtuels dans l'espace laissé vacant. L'objectif de cette thèse CIFRE est donc de mettre en place un scénario de réalité diminuée pouvant être exécuté sur un système mobile (tablette IOS ou Android) qui génère des images photo-réalistes de la scène diminuée. Pour cela, à partir d’un modèle géométrique de la pièce d'intérieur que l'on veut altérer, nous adaptons et améliorons des procédures d'effacement d'éléments d'une image appelées inpainting dans une image 2D. Ensuite, nous appliquons ces techniques dans le contexte 3D intérieur pour tenir compte de la géométrie de la scène. Enfin, nous analysons la luminosité pour augmenter le réalisme des zones complétées.Dans cette thèse, nous rappelons d'abord les différents travaux académiques et les solutions industrielles existantes. Nous évoquons leurs avantages et leurs limites. Nous abordons ensuite les différentes techniques d'inpainting existantes pour introduire notre première contribution qui propose d'adapter une des méthodes de l’état de l’art pour prendre en compte de la structure du motif de la texture. La problématique de la luminosité est ensuite abordée en proposant un processus qui traite séparément la texture et la variation de la luminosité. Nous présentons ensuite une troisième contribution qui propose un critère de confiance basé sur des considérations radiométriques pour sélectionner une information selon sa qualité dans le processus d'inpainting. Nous proposons une dernière contribution basée sur la complétion de texture de modèles 3D non planaires reconstruits à partir de peu d’images et donc présentant une texture incomplète. Enfin, nous montrons les applications développées grâce à ces travaux dans le contexte des scènes d'intérieur considérées par Innersens

    Non-Local Patch-Based Image Inpainting

    No full text
    corecore