128 research outputs found

    Pareto Optimized Large Mask Approach for Efficient and Background Humanoid Shape Removal

    Get PDF
    The purpose of automated video object removal is to not only detect and remove the object of interest automatically, but also to utilize background context to inpaint the foreground area. Video inpainting requires to fill spatiotemporal gaps in a video with convincing material, necessitating both temporal and spatial consistency; the inpainted part must seamlessly integrate into the background in a variety of scenes, and it must maintain a consistent appearance in subsequent frames even if its surroundings change noticeably. We introduce deep learning-based methodology for removing unwanted human-like shapes in videos. The method uses Pareto-optimized Generative Adversarial Networks (GANs) technology, which is a novel contribution. The system automatically selects the Region of Interest (ROI) for each humanoid shape and uses a skeleton detection module to determine which humanoid shape to retain. The semantic masks of human like shapes are created using a semantic-aware occlusion-robust model that has four primary components: feature extraction, and local, global, and semantic branches. The global branch encodes occlusion-aware information to make the extracted features resistant to occlusion, while the local branch retrieves fine-grained local characteristics. A modified big mask inpainting approach is employed to eliminate a person from the image, leveraging Fast Fourier convolutions and utilizing polygonal chains and rectangles with unpredictable aspect ratios. The inpainter network takes the input image and the mask to create an output image excluding the background humanoid shapes. The generator uses an encoder-decoder structure with included skip connections to recover spatial information and dilated convolution and squeeze and excitation blocks to make the regions behind the humanoid shapes consistent with their surroundings. The discriminator avoids dissimilar structure at the patch scale, and the refiner network catches features around the boundaries of each background humanoid shape. The efficiency was assessed using the Structural Learned Perceptual Image Patch Similarity, Frechet Inception Distance, and Similarity Index Measure metrics and showed promising results in fully automated background person removal task. The method is evaluated on two video object segmentation datasets (DAVIS indicating respective values of 0.02, FID of 5.01 and SSIM of 0.79 and YouTube-VOS, resulting in 0.03, 6.22, 0.78 respectively) as well a database of 66 distinct video sequences of people behind a desk in an office environment (0.02, 4.01, and 0.78 respectively).publishedVersio

    DIGITAL INPAINTING ALGORITHMS AND EVALUATION

    Get PDF
    Digital inpainting is the technique of filling in the missing regions of an image or a video using information from surrounding area. This technique has found widespread use in applications such as restoration, error recovery, multimedia editing, and video privacy protection. This dissertation addresses three significant challenges associated with the existing and emerging inpainting algorithms and applications. The three key areas of impact are 1) Structure completion for image inpainting algorithms, 2) Fast and efficient object based video inpainting framework and 3) Perceptual evaluation of large area image inpainting algorithms. One of the main approach of existing image inpainting algorithms in completing the missing information is to follow a two stage process. A structure completion step, to complete the boundaries of regions in the hole area, followed by texture completion process using advanced texture synthesis methods. While the texture synthesis stage is important, it can be argued that structure completion aspect is a vital component in improving the perceptual image inpainting quality. To this end, we introduce a global structure completion algorithm for completion of missing boundaries using symmetry as the key feature. While existing methods for symmetry completion require a-priori information, our method takes a non-parametric approach by utilizing the invariant nature of curvature to complete missing boundaries. Turning our attention from image to video inpainting, we readily observe that existing video inpainting techniques have evolved as an extension of image inpainting techniques. As a result, they suffer from various shortcoming including, among others, inability to handle large missing spatio-temporal regions, significantly slow execution time making it impractical for interactive use and presence of temporal and spatial artifacts. To address these major challenges, we propose a fundamentally different method based on object based framework for improving the performance of video inpainting algorithms. We introduce a modular inpainting scheme in which we first segment the video into constituent objects by using acquired background models followed by inpainting of static background regions and dynamic foreground regions. For static background region inpainting, we use a simple background replacement and occasional image inpainting. To inpaint dynamic moving foreground regions, we introduce a novel sliding-window based dissimilarity measure in a dynamic programming framework. This technique can effectively inpaint large regions of occlusions, inpaint objects that are completely missing for several frames, change in size and pose and has minimal blurring and motion artifacts. Finally we direct our focus on experimental studies related to perceptual quality evaluation of large area image inpainting algorithms. The perceptual quality of large area inpainting technique is inherently a subjective process and yet no previous research has been carried out by taking the subjective nature of the Human Visual System (HVS). We perform subjective experiments using eye-tracking device involving 24 subjects to analyze the effect of inpainting on human gaze. We experimentally show that the presence of inpainting artifacts directly impacts the gaze of an unbiased observer and this in effect has a direct bearing on the subjective rating of the observer. Specifically, we show that the gaze energy in the hole regions of an inpainted image show marked deviations from normal behavior when the inpainting artifacts are readily apparent

    3D exemplar-based image inpainting in electron microscopy

    Get PDF
    In electron microscopy (EM) a common problem is the non-availability of data, which causes artefacts in reconstructions. In this thesis the goal is to generate artificial data where missing in EM by using exemplar-based inpainting (EBI). We implement an accelerated 3D version tailored to applications in EM, which reduces reconstruction times from days to minutes. We develop intelligent sampling strategies to find optimal data as input for reconstruction methods. Further, we investigate approaches to reduce electron dose and acquisition time. Sparse sampling followed by inpainting is the most promising approach. As common evaluation measures may lead to misinterpretation of results in EM and falsify a subsequent analysis, we propose to use application driven metrics and demonstrate this in a segmentation task. A further application of our technique is the artificial generation of projections in tiltbased EM. EBI is used to generate missing projections, such that the full angular range is covered. Subsequent reconstructions are significantly enhanced in terms of resolution, which facilitates further analysis of samples. In conclusion, EBI proves promising when used as an additional data generation step to tackle the non-availability of data in EM, which is evaluated in selected applications. Enhancing adaptive sampling methods and refining EBI, especially considering the mutual influence, promotes higher throughput in EM using less electron dose while not lessening quality.Ein häufig vorkommendes Problem in der Elektronenmikroskopie (EM) ist die Nichtverfügbarkeit von Daten, was zu Artefakten in Rekonstruktionen führt. In dieser Arbeit ist es das Ziel fehlende Daten in der EM künstlich zu erzeugen, was durch Exemplar-basiertes Inpainting (EBI) realisiert wird. Wir implementieren eine auf EM zugeschnittene beschleunigte 3D Version, welche es ermöglicht, Rekonstruktionszeiten von Tagen auf Minuten zu reduzieren. Wir entwickeln intelligente Abtaststrategien, um optimale Datenpunkte für die Rekonstruktion zu erhalten. Ansätze zur Reduzierung von Elektronendosis und Aufnahmezeit werden untersucht. Unterabtastung gefolgt von Inpainting führt zu den besten Resultaten. Evaluationsmaße zur Beurteilung der Rekonstruktionsqualität helfen in der EM oft nicht und können zu falschen Schlüssen führen, weswegen anwendungsbasierte Metriken die bessere Wahl darstellen. Dies demonstrieren wir anhand eines Beispiels. Die künstliche Erzeugung von Projektionen in der neigungsbasierten Elektronentomographie ist eine weitere Anwendung. EBI wird verwendet um fehlende Projektionen zu generieren. Daraus resultierende Rekonstruktionen weisen eine deutlich erhöhte Auflösung auf. EBI ist ein vielversprechender Ansatz, um nicht verfügbare Daten in der EM zu generieren. Dies wird auf Basis verschiedener Anwendungen gezeigt und evaluiert. Adaptive Aufnahmestrategien und EBI können also zu einem höheren Durchsatz in der EM führen, ohne die Bildqualität merklich zu verschlechtern

    Novel Video Completion Approaches and Their Applications

    Get PDF
    Video completion refers to automatically restoring damaged or removed objects in a video sequence, with applications ranging from sophisticated video removal of undesired static or dynamic objects to correction of missing or corrupted video frames in old movies and synthesis of new video frames to add, modify, or generate a new visual story. The video completion problem can be solved using texture synthesis and/or data interpolation to fill-in the holes of the sequence inward. This thesis makes a distinction between still image completion and video completion. The latter requires visually pleasing consistency by taking into account the temporal information. Based on their applied concepts, video completion techniques are categorized as inpainting and texture synthesis. We present a bandlet transform-based technique for each of these categories of video completion techniques. The proposed inpainting-based technique is a 3D volume regularization scheme that takes advantage of bandlet bases for exploiting the anisotropic regularities to reconstruct a damaged video. The proposed exemplar-based approach, on the other hand, performs video completion using a precise patch fusion in the bandlet domain instead of patch replacement. The video completion task is extended to two important applications in video restoration. First, we develop an automatic video text detection and removal that benefits from the proposed inpainting scheme and a novel video text detector. Second, we propose a novel video super-resolution technique that employs the inpainting algorithm spatially in conjunction with an effective structure tensor, generated using bandlet geometry. The experimental results show a good performance of the proposed video inpainting method and demonstrate the effectiveness of bandlets in video completion tasks. The proposed video text detector and the video super resolution scheme also show a high performance in comparison with existing methods

    Edge-Based Automated Facial Blemish Removal

    Get PDF
    This thesis presents an end-to-end approach for taking a an image of a face and seamlessly isolating and filling in any blemishes contained therein. This consists of detecting the face within a larger image, building an accurate mask of the facial features so as not to mistake them as blemishes, detecting the blemishes themselves and painting over them with accurate skin tones. We devote the first part of the thesis to detailing our algorithm for extracting facial features. This is done by first improving the image through histogram equal- ization and illumination compensation followed by finding the features themselves from a computed edge map. Geometric knowledge of general feature positioning and blemish shapes is used to determine which edge clusters belong to correspond- ing facial features. Color and reflectance thresholding is then used to build a skin map. In the second part of the thesis we identify the blemishes themselves. A Lapla- cian of Gaussian blob detector is used to identify potential candidates. Thresholding and dilating operations are then performed to trim this candidate list down followed by the use of various morphological properties to reject regions likely to not be blem- ishes. Finally, in the third part, we examine four possible techniques for inpainting blemish regions once found. We settle on using a technique that fills in pixels based on finding a patch in the nearby image region with the most similar surrounding texture to the target pixel. Priority in the pixel fill-order is given to strong edges and contours

    Persistent Homology Tools for Image Analysis

    Get PDF
    Topological Data Analysis (TDA) is a new field of mathematics emerged rapidly since the first decade of the century from various works of algebraic topology and geometry. The goal of TDA and its main tool of persistent homology (PH) is to provide topological insight into complex and high dimensional datasets. We take this premise onboard to get more topological insight from digital image analysis and quantify tiny low-level distortion that are undetectable except possibly by highly trained persons. Such image distortion could be caused intentionally (e.g. by morphing and steganography) or naturally in abnormal human tissue/organ scan images as a result of onset of cancer or other diseases. The main objective of this thesis is to design new image analysis tools based on persistent homological invariants representing simplicial complexes on sets of pixel landmarks over a sequence of distance resolutions. We first start by proposing innovative automatic techniques to select image pixel landmarks to build a variety of simplicial topologies from a single image. Effectiveness of each image landmark selection demonstrated by testing on different image tampering problems such as morphed face detection, steganalysis and breast tumour detection. Vietoris-Rips simplicial complexes constructed based on the image landmarks at an increasing distance threshold and topological (homological) features computed at each threshold and summarized in a form known as persistent barcodes. We vectorise the space of persistent barcodes using a technique known as persistent binning where we demonstrated the strength of it for various image analysis purposes. Different machine learning approaches are adopted to develop automatic detection of tiny texture distortion in many image analysis applications. Homological invariants used in this thesis are the 0 and 1 dimensional Betti numbers. We developed an innovative approach to design persistent homology (PH) based algorithms for automatic detection of the above described types of image distortion. In particular, we developed the first PH-detector of morphing attacks on passport face biometric images. We shall demonstrate significant accuracy of 2 such morph detection algorithms with 4 types of automatically extracted image landmarks: Local Binary patterns (LBP), 8-neighbour super-pixels (8NSP), Radial-LBP (R-LBP) and centre-symmetric LBP (CS-LBP). Using any of these techniques yields several persistent barcodes that summarise persistent topological features that help gaining insights into complex hidden structures not amenable by other image analysis methods. We shall also demonstrate significant success of a similarly developed PH-based universal steganalysis tool capable for the detection of secret messages hidden inside digital images. We also argue through a pilot study that building PH records from digital images can differentiate breast malignant tumours from benign tumours using digital mammographic images. The research presented in this thesis creates new opportunities to build real applications based on TDA and demonstrate many research challenges in a variety of image processing/analysis tasks. For example, we describe a TDA-based exemplar image inpainting technique (TEBI), superior to existing exemplar algorithm, for the reconstruction of missing image regions

    Patch-based methods for variational image processing problems

    Get PDF
    Image Processing problems are notoriously difficult. To name a few of these difficulties, they are usually ill-posed, involve a huge number of unknowns (from one to several per pixel!), and images cannot be considered as the linear superposition of a few physical sources as they contain many different scales and non-linearities. However, if one considers instead of images as a whole small blocks (or patches) inside the pictures, many of these hurdles vanish and problems become much easier to solve, at the cost of increasing again the dimensionality of the data to process. Following the seminal NL-means algorithm in 2005-2006, methods that consider only the visual correlation between patches and ignore their spatial relationship are called non-local methods. While powerful, it is an arduous task to define non-local methods without using heuristic formulations or complex mathematical frameworks. On the other hand, another powerful property has brought global image processing algorithms one step further: it is the sparsity of images in well chosen representation basis. However, this property is difficult to embed naturally in non-local methods, yielding algorithms that are usually inefficient or circonvoluted. In this thesis, we explore alternative approaches to non-locality, with the goals of i) developing universal approaches that can handle local and non-local constraints and ii) leveraging the qualities of both non-locality and sparsity. For the first point, we will see that embedding the patches of an image into a graph-based framework can yield a simple algorithm that can switch from local to non-local diffusion, which we will apply to the problem of large area image inpainting. For the second point, we will first study a fast patch preselection process that is able to group patches according to their visual content. This preselection operator will then serve as input to a social sparsity enforcing operator that will create sparse groups of jointly sparse patches, thus exploiting all the redundancies present in the data, in a simple mathematical framework. Finally, we will study the problem of reconstructing plausible patches from a few binarized measurements. We will show that this task can be achieved in the case of popular binarized image keypoints descriptors, thus demonstrating a potential privacy issue in mobile visual recognition applications, but also opening a promising way to the design and the construction of a new generation of smart cameras

    Medical image synthesis using generative adversarial networks: towards photo-realistic image synthesis

    Full text link
    This proposed work addresses the photo-realism for synthetic images. We introduced a modified generative adversarial network: StencilGAN. It is a perceptually-aware generative adversarial network that synthesizes images based on overlaid labelled masks. This technique can be a prominent solution for the scarcity of the resources in the healthcare sector

    Super-resolução em vídeos de baixa qualidade para aplicações forenses, de vigilância e móveis

    Get PDF
    Orientadores: Siome Klein Goldenstein, Anderson de Rezende RochaTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Algoritmos de super-resolução (SR) são métodos para obter um aumento da resolução de imagens compostas por pixels. Na super-resolução por múltiplas imagens, um conjunto de imagens de baixa resolução de uma cena é combinado para construir uma imagem de resolução superior. Super-resolução é uma solução barata para superar as limitações dos sistemas de aquisição de imagens, e pode ser útil em diversos casos em que o dispositivo não pode ser melhorado ou substituído - mas em que é possível obter diversas capturas da mesma cena. Neste trabalho, é explorada a super-resolução por múltiplas imagens para imagens naturais, em cenários nos quais é possível obter diversas imagens de uma cena. São propostas cinco variações de um método que explora propriedades geométricas de múltiplas imagens de baixa resolução para combiná-las em uma imagem de resolução superior; duas variações de um método que combina técnicas de inpainting e super-resolução; e mais três variações de um método que utiliza filtros adaptativos e regularização para resolver um problema de mínimos quadrados. Super-resolução por múltiplas imagens é possível quando existe movimento e informações não redundantes entre as imagens de baixa resolução. Entretanto, combiná-las em uma imagem de resolução superior pode não ser computacionalmente viável por técnicas complexas de super-resolução. A primeira aplicação dos métodos propostos é para um conjunto de imagens capturadas pelos dispositivos móveis mais recentes. Este tipo de ambiente requer algoritmos eficazes que sejam executados rapidamente e utilizando baixo consumo de memória. A segunda aplicação é na Ciência Forense. Câmeras de vigilância espalhadas pelas cidades poderiam fornecer dicas importantes para identificar um suspeito, por exemplo, em uma cena de crime. Entretanto, o reconhecimento dos caracteres de placas veiculares é especialmente difícil quando a resolução das imagens é baixa. Por isso, este trabalho também propõe um arcabouço que realiza a super-resolução de placas veiculares em vídeos reais de vigilância, capturados por câmeras de baixa qualidade e não projetadas especificamente para esta tarefa, ajudando o especialista forense a compreender um evento de interesse. O arcabouço realiza todas as etapas necessárias para rastrear, alinhar, reconstruir e reconhecer automaticamente os caracteres de uma placa suspeita. O usuário recebe, como saída, a imagem de alta resolução reconstruída, mais rica em detalhes, e também a sequência de caracteres reconhecida automaticamente nesta imagem. São apresentadas validações quantitativas e qualitativas dos algoritmos propostos e de suas aplicações. Os experimentos mostram, por exemplo, que é possível aumentar o número de caracteres reconhecidos corretamente, colocando o arcabouço proposto como uma ferramenta importante para fornecer aos peritos uma solução para o reconhecimento de placas veiculares sob condições adversas de aquisição. Por fim, também é sugerido o número mínimo de imagens a ser utilizada como entrada em cada aplicaçãoAbstract: Super-resolution (SR) algorithms are methods for achieving high-resolution (HR) enlargements of pixel-based images. In multi-frame super resolution, a set of low-resolution (LR) images of a scene are combined to construct an image with higher resolution. Super resolution is an inexpensive solution to overcome the limitations of image acquisition hardware systems, and can be useful in several cases in which the device cannot be upgraded or replaced, but multiple frames of the same scene can be obtained. In this work, we explore SR possibilities for natural images, in scenarios wherein we have multiple frames of a same scene. We design and develop five variations of an algorithm which rely on exploring geometric properties in order to combine pixels from LR observations into an HR grid; two variations of a method that combines inpainting techniques to multi-frame super resolution; and three variations of an algorithm that uses adaptive filtering and Tikhonov regularization to solve a least-square problem. Multi-frame super resolution is possible when there is motion and non-redundant information from LR observations. However, combining a large number of frames into a higher resolution image may not be computationally feasible by complex super-resolution techniques. The first application of the proposed methods is in consumer-grade photography with a setup in which several low-resolution images gathered by recent mobile devices can be combined to create a much higher resolution image. Such always-on low-power environment requires effective high-performance algorithms, that run fastly and with a low-memory footprint. The second application is in Digital Forensic, with a setup in which low-quality surveillance cameras throughout the cities could provide important cues to identify a suspect vehicle, for example, in a crime scene. However, license-plate recognition is especially difficult under poor image resolutions. Hence, we design and develop a novel, free and open-source framework underpinned by SR and Automatic License-Plate Recognition (ALPR) techniques to identify license-plate characters in low-quality real-world traffic videos, captured by cameras not designed for the ALPR task, aiding forensic analysts in understanding an event of interest. The framework handles the necessary conditions to identify a target license plate, using a novel methodology to locate, track, align, super resolve, and recognize its alphanumerics. The user receives as outputs the rectified and super-resolved license-plate, richer in details, and also the sequence of license-plates characters that have been automatically recognized in the super-resolved image. We present quantitative and qualitative validations of the proposed algorithms and its applications. Our experiments show, for example, that SR can increase the number of correctly recognized characters posing the framework as an important step toward providing forensic experts and practitioners with a solution for the license-plate recognition problem under difficult acquisition conditions. Finally, we also suggest a minimum number of images to use as input in each applicationDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação1197478,146886153996/3-2015CAPESCNP
    corecore