78 research outputs found

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    Recent Advances in Image Restoration with Applications to Real World Problems

    Get PDF
    In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Super Resolution of Wavelet-Encoded Images and Videos

    Get PDF
    In this dissertation, we address the multiframe super resolution reconstruction problem for wavelet-encoded images and videos. The goal of multiframe super resolution is to obtain one or more high resolution images by fusing a sequence of degraded or aliased low resolution images of the same scene. Since the low resolution images may be unaligned, a registration step is required before super resolution reconstruction. Therefore, we first explore in-band (i.e. in the wavelet-domain) image registration; then, investigate super resolution. Our motivation for analyzing the image registration and super resolution problems in the wavelet domain is the growing trend in wavelet-encoded imaging, and wavelet-encoding for image/video compression. Due to drawbacks of widely used discrete cosine transform in image and video compression, a considerable amount of literature is devoted to wavelet-based methods. However, since wavelets are shift-variant, existing methods cannot utilize wavelet subbands efficiently. In order to overcome this drawback, we establish and explore the direct relationship between the subbands under a translational shift, for image registration and super resolution. We then employ our devised in-band methodology, in a motion compensated video compression framework, to demonstrate the effective usage of wavelet subbands. Super resolution can also be used as a post-processing step in video compression in order to decrease the size of the video files to be compressed, with downsampling added as a pre-processing step. Therefore, we present a video compression scheme that utilizes super resolution to reconstruct the high frequency information lost during downsampling. In addition, super resolution is a crucial post-processing step for satellite imagery, due to the fact that it is hard to update imaging devices after a satellite is launched. Thus, we also demonstrate the usage of our devised methods in enhancing resolution of pansharpened multispectral images

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition

    Video event detection and visual data pro cessing for multimedia applications

    Get PDF
    Cette thèse (i) décrit une procédure automatique pour estimer la condition d'arrêt des méthodes de déconvolution itératives basées sur un critère d'orthogonalité du signal estimé et de son gradient à une itération donnée; (ii) présente une méthode qui décompose l'image en une partie géométrique (ou "cartoon") et une partie "texture" en utilisation une estimation de paramètre et une condition d'arrêt basées sur la diffusion anisotropique avec orthogonalité, en utilisant le fait que ces deux composantes. "cartoon" et "texture", doivent être indépendantes; (iii) décrit une méthode pour extraire d'une séquence vidéo obtenue à partir de caméra portable les objets de premier plan en mouvement. Cette méthode augmente la compensation de mouvement de la caméra par une nouvelle estimation basée noyau de la fonction de probabilité de densité des pixels d'arrière-plan. Les méthodes présentées ont été testées et comparées aux algorithmes de l'état de l'art.This dissertation (i) describes an automatic procedure for estimating the stopping condition of non-regularized iterative deconvolution methods based on an orthogonality criterion of the estimated signal and its gradient at a given iteration; (ii) presents a decomposition method that splits the image into geometric (or cartoon) and texture parts using anisotropic diffusion with orthogonality based parameter estimation and stopping condition, utilizing the theory that the cartoon and the texture components of an image should be independent of each other; (iii) describes a method for moving foreground object extraction in sequences taken by wearable camera, with strong motion, where the camera motion compensated frame differencing is enhanced with a novel kernel-based estimation of the probability density function of the background pixels. The presented methods have been thoroughly tested and compared to other similar algorithms from the state-of-the-art.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Learning Representations for Controllable Image Restoration

    Get PDF
    Deep Convolutional Neural Networks have sparked a renaissance in all the sub-fields of computer vision. Tremendous progress has been made in the area of image restoration. The research community has pushed the boundaries of image deblurring, super-resolution, and denoising. However, given a distorted image, most existing methods typically produce a single restored output. The tasks mentioned above are inherently ill-posed, leading to an infinite number of plausible solutions. This thesis focuses on designing image restoration techniques capable of producing multiple restored results and granting users more control over the restoration process. Towards this goal, we demonstrate how one could leverage the power of unsupervised representation learning. Image restoration is vital when applied to distorted images of human faces due to their social significance. Generative Adversarial Networks enable an unprecedented level of generated facial details combined with smooth latent space. We leverage the power of GANs towards the goal of learning controllable neural face representations. We demonstrate how to learn an inverse mapping from image space to these latent representations, tuning these representations towards a specific task, and finally manipulating latent codes in these spaces. For example, we show how GANs and their inverse mappings enable the restoration and editing of faces in the context of extreme face super-resolution and the generation of novel view sharp videos from a single motion-blurred image of a face. This thesis also addresses more general blind super-resolution, denoising, and scratch removal problems, where blur kernels and noise levels are unknown. We resort to contrastive representation learning and first learn the latent space of degradations. We demonstrate that the learned representation allows inference of ground-truth degradation parameters and can guide the restoration process. Moreover, it enables control over the amount of deblurring and denoising in the restoration via manipulation of latent degradation features
    • …
    corecore