14 research outputs found

    Near-Infrared Fusion for Photorealistic Image Dehazing

    Get PDF
    Scattering of light due to the presence of aerosol particles along the path of radiation causes atmospheric haze in images. This scattering is significantly less severe in longer wavelength bands than in shorter ones, thus the importance of near-infrared (NIR) information for dehazing color images. This paper first presents an adaptive hyperspectral al- gorithm that analyzes intensity inconsistencies across spectral bands. It then leverages the algorithm’s results to preserve photorealism of the visible color image during the dehazing. The color images are dehazed through a hyperspectral fusion of color and NIR images, taking into account any inconsistencies that can affect the photorealism. Our dehazing results on real images contain no halo or aliasing artifacts in hazy regions and successfully preserve the color image elsewhere

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    AAM: An Assessment Metric of Axial Chromatic Aberration

    Get PDF
    Knowledge of lens specifications is important to identify the best lens for a given capture scenario and application. Lens manufacturers provide many specifications in their data sheets, and multiple initiatives for testing and comparing different lenses can be found online. However, due to the lack of a suitable metric or technique, no evaluation of axial chromatic aberration is available. In this paper, we propose a metric, Axial Aberration Magnitude or AAM, that assesses the degree of axial chromatic aberration of a given lens. Our metric is generalizable to multispectral acquisition systems and is very simple and cheap to compute. We present the entire procedure and algorithm for computing the AAM metric, and evaluate it for two spectral systems and two consumer lenses

    Sparse Gradient Optimization and its Applications in Image Processing

    Get PDF
    Millions of digital images are captured by imaging devices on a daily basis. The way imaging devices operate follows an integral process from which the information of the original scene needs to be estimated. The estimation is done by inverting the integral process of the imaging device with the use of optimization techniques. This linear inverse problem, the inversion of the integral acquisition process, is at the heart of several image processing applications such as denoising, deblurring, inpainting, and super-resolution. We describe in detail the use of linear inverse problems in these applications. We review and compare several state-of-the-art optimization algorithms that invert this integral process. Linear inverse problems are usually very difficult to solve. Therefore, additional prior assumptions need to be introduced to successfully estimate the output signal. Several priors have been suggested in the research literature, with the Total Variation (TV) being one of the most prominent. In this thesis, we review another prior, the l0 pseudo-norm over the gradient domain. This prior allows full control over how many non-zero gradients are retained to approximate prominent structures of the image. We show the superiority of the l0 gradient prior over the TV prior in recovering genuinely piece-wise constant signals. The l0 gradient prior has shown to produce state-of-the-art results in edge-preserving image smoothing. Moreover, this general prior can be applied to several other applications, such as edge extraction, clip-art JPEG artifact removal, non-photorealistic image rendering, detail magnification, and tone mapping. We review and evaluate several state-of-the-art algorithms that solve the optimization problem based on the l0 gradient prior. Subsequently we apply the l0 gradient prior to two applications where we show superior results as compared to the current state-of-the-art. The first application is that of single-image reflection removal. Existing solutions to this problem have shown limited success because of the highly ill-posed nature of the problem. We show that the standard l0 gradient prior with a modified data-fidelity term based on the Laplacian operator is able to sufficiently remove unwanted reflections from images in many realistic scenarios. We conduct extensive experiments and show that our method outperforms the state-of-the-art. In the second application of haze removal from visible-NIR image pairs we propose a novel optimization framework, where the prior term penalizes the number of non-zero gradients of the difference between the output and the NIR image. Due to the longer wavelengths of NIR, an image taken in the NIR spectrum suffers significantly less from haze artifacts. Using this prior term, we are able to transfer details from the haze-free NIR image to the final result. We show that our formulation provides state-of-the-art results compared to haze removal methods that use a single image and also to those that are based on visible-NIR image pairs

    Robotic Manipulation under Transparency and Translucency from Light-field Sensing

    Full text link
    From frosted windows to plastic containers to refractive fluids, transparency and translucency are prevalent in human environments. The material properties of translucent objects challenge many of our assumptions in robotic perception. For example, the most common RGB-D sensors require the sensing of an infrared structured pattern from a Lambertian reflectance of surfaces. As such, transparent and translucent objects often remain invisible to robot perception. Thus, introducing methods that would enable robots to correctly perceive and then interact with the environment would be highly beneficial. Light-field (or plenoptic) cameras, for instance, which carry light direction and intensity, make it possible to perceive visual clues on transparent and translucent objects. In this dissertation, we explore the inference of transparent and translucent objects from plenoptic observations for robotic perception and manipulation. We propose a novel plenoptic descriptor, Depth Likelihood Volume (DLV), that incorporates plenoptic observations to represent depth of a pixel as a distribution rather than a single value. Building on the DLV, we present the Plenoptic Monte Carlo Localization algorithm, PMCL, as a generative method to infer 6-DoF poses of objects in settings with translucency. PMCL is able to localize both isolated transparent objects and opaque objects behind translucent objects using a DLV computed from a single view plenoptic observation. The uncertainty induced by transparency and translucency for pose estimation increases greatly as scenes become more cluttered. Under this scenario, we propose GlassLoc to localize feasible grasp poses directly from local DLV features. In GlassLoc, a convolutional neural network is introduced to learn DLV features for classifying grasp poses with grasping confidence. GlassLoc also suppresses the reflectance over multi-view plenoptic observations, which leads to more stable DLV representation. We evaluate GlassLoc in the context of a pick-and-place task for transparent tableware in a cluttered tabletop environment. We further observe that the transparent and translucent objects will generate distinguishable features in the light-field epipolar image plane. With this insight, we propose Light-field Inference of Transparency, LIT, as a two-stage generative-discriminative refractive object localization approach. In the discriminative stage, LIT uses convolutional neural networks to learn reflection and distortion features from photorealistic-rendered light-field images. The learned features guide generative object location inference through local depth estimation and particle optimization. We compare LIT with four state-of-the-art pose estimators to show our efficacy in the transparent object localization task. We perform a robot demonstration by building a champagne tower using the LIT pipeline.PHDRoboticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169707/1/zhezhou_1.pd

    Generative Adversarial Network and Its Application in Aerial Vehicle Detection and Biometric Identification System

    Get PDF
    In recent years, generative adversarial networks (GANs) have shown great potential in advancing the state-of-the-art in many areas of computer vision, most notably in image synthesis and manipulation tasks. GAN is a generative model which simultaneously trains a generator and a discriminator in an adversarial manner to produce real-looking synthetic data by capturing the underlying data distribution. Due to its powerful ability to generate high-quality and visually pleasingresults, we apply it to super-resolution and image-to-image translation techniques to address vehicle detection in low-resolution aerial images and cross-spectral cross-resolution iris recognition. First, we develop a Multi-scale GAN (MsGAN) with multiple intermediate outputs, which progressively learns the details and features of the high-resolution aerial images at different scales. Then the upscaled super-resolved aerial images are fed to a You Only Look Once-version 3 (YOLO-v3) object detector and the detection loss is jointly optimized along with a super-resolution loss to emphasize target vehicles sensitive to the super-resolution process. There is another problem that remains unsolved when detection takes place at night or in a dark environment, which requires an IR detector. Training such a detector needs a lot of infrared (IR) images. To address these challenges, we develop a GAN-based joint cross-modal super-resolution framework where low-resolution (LR) IR images are translated and super-resolved to high-resolution (HR) visible (VIS) images before applying detection. This approach significantly improves the accuracy of aerial vehicle detection by leveraging the benefits of super-resolution techniques in a cross-modal domain. Second, to increase the performance and reliability of deep learning-based biometric identification systems, we focus on developing conditional GAN (cGAN) based cross-spectral cross-resolution iris recognition and offer two different frameworks. The first approach trains a cGAN to jointly translate and super-resolve LR near-infrared (NIR) iris images to HR VIS iris images to perform cross-spectral cross-resolution iris matching to the same resolution and within the same spectrum. In the second approach, we design a coupled GAN (cpGAN) architecture to project both VIS and NIR iris images into a low-dimensional embedding domain. The goal of this architecture is to ensure maximum pairwise similarity between the feature vectors from the two iris modalities of the same subject. We have also proposed a pose attention-guided coupled profile-to-frontal face recognition network to learn discriminative and pose-invariant features in an embedding subspace. To show that the feature vectors learned by this deep subspace can be used for other tasks beyond recognition, we implement a GAN architecture which is able to reconstruct a frontal face from its corresponding profile face. This capability can be used in various face analysis tasks, such as emotion detection and expression tracking, where having a frontal face image can improve accuracy and reliability. Overall, our research works have shown its efficacy by achieving new state-of-the-art results through extensive experiments on publicly available datasets reported in the literature

    Deep Learning Based Face Image Synthesis

    Get PDF
    Face image synthesis is an important problem in the biometrics and computer vision communities due to its applications in law enforcement and entertainment. In this thesis, we develop novel deep neural network models and associated loss functions for two face image synthesis problems, namely thermal to visible face synthesis and visual attribute to face synthesis. In particular, for thermal to visible face synthesis, we propose a model which makes use of facial attributes to obtain better synthesis. We use attributes extracted from visible images to synthesize attribute-preserved visible images from thermal imagery. A pre-trained attribute predictor network is used to extract attributes from the visible image. Then, a novel multi-scale generator is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a pre-trained VGG-Face network is leveraged to extract features from the synthesized image and the input visible image for verification. In addition, we propose another thermal to visible face synthesis method based on selfattention generative adversarial network (SAGAN) which allows efficient attention-guided image synthesis. Rather than focusing only on synthesizing visible faces from thermal faces, we also propose to synthesize thermal faces from visible faces. Our intuition is based on the fact that thermal images also contain some discriminative information about the person for verification. Deep features from a pre-trained Convolutional Neural Network (CNN) are extracted from the original as well as the synthesized images. These features are then fused to generate a template which is then used for cross-modal face verification. Regarding attribute to face image synthesis, we propose the Att2SK2Face model for face image synthesis from visual attributes via sketch. In this approach, we first synthesize facial sketch corresponding to the visual attributes and then generate the face image based on the synthesized sketch. The proposed framework is based on a combination of two different Generative Adversarial Networks (GANs) – (1) a sketch generator network which synthesizes realistic sketch from the input attributes, and (2) a face generator network which synthesizes facial images from the synthesized sketch images with the help of facial attributes. Finally, we propose another synthesis model, called Att2MFace, which can simultaneously synthesize multimodal faces from visual attributes without requiring paired data in different domains for training the network. We introduce a novel generator with multimodal stretch-out modules to simultaneously synthesize multimodal face images. Additionally, multimodal stretch-in modules are introduced in the discriminator which discriminate between real and fake images

    Biometric Systems

    Get PDF
    Biometric authentication has been widely used for access control and security systems over the past few years. The purpose of this book is to provide the readers with life cycle of different biometric authentication systems from their design and development to qualification and final application. The major systems discussed in this book include fingerprint identification, face recognition, iris segmentation and classification, signature verification and other miscellaneous systems which describe management policies of biometrics, reliability measures, pressure based typing and signature verification, bio-chemical systems and behavioral characteristics. In summary, this book provides the students and the researchers with different approaches to develop biometric authentication systems and at the same time includes state-of-the-art approaches in their design and development. The approaches have been thoroughly tested on standard databases and in real world applications
    corecore