66 research outputs found

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    DEEP INFERENCE ON MULTI-SENSOR DATA

    Get PDF
    Computer vision-based intelligent autonomous systems engage various types of sensors to perceive the world they navigate in. Vision systems perceive their environments through inferences on entities (structures, humans) and their attributes (pose, shape, materials) that are sensed using RGB and Near-InfraRed (NIR) cameras, LAser Detection And Ranging (LADAR), radar and so on. This leads to challenging and interesting problems in efficient data-capture, feature extraction, and attribute estimation, not only for RGB but various other sensors. In some cases, we encounter very limited amounts of labeled training data. In certain other scenarios we have sufficient data, but annotations are unavailable for supervised learning. This dissertation explores two approaches to learning under conditions of minimal to no ground truth. The first approach applies projections on training data that make learning efficient by improving training dynamics. The first and second topics in this dissertation belong to this category. The second approach makes learning without ground-truth possible via knowledge transfer from a labeled source domain to an unlabeled target domain through projections to domain-invariant shared latent spaces. The third and fourth topics in this dissertation belong to this category. For the first topic we study the feasibility and efficacy of identifying shapes in LADAR data in several measurement modes. We present results on efficient parameter learning with less data (for both traditional machine learning as well as deep models) on LADAR images. We use a LADAR apparatus to obtain range information from a 3-D scene by emitting laser beams and collecting the reflected rays from target objects in the region of interest. The Agile Beam LADAR concept makes the measurement and interpretation process more efficient using a software-defined architecture that leverages computational imaging principles. Using these techniques, we show that object identification and scene understanding can be accurately performed in the LADARmeasurement domain thereby rendering the efforts of pixel-based scene reconstruction superfluous. Next, we explore the effectiveness of deep features extracted by Convolutional Neural Networks (CNNs) in the Discrete Cosine Transform (DCT) domain for various image classification tasks such as pedestrian and face detection, material identification and object recognition. We perform the DCT operation on the feature maps generated by convolutional layers in CNNs. We compare the performance of the same network with the same hyper-parameters with or without the DCT step. Our results indicate that a DCT operation incorporated into the network after the first convolution layer can have certain advantages such as convergence over fewer training epochs and sparser weight matrices that are more conducive to pruning and hashing techniques. Next, we present an adversarial deep domain adaptation (ADA)-based approach for training deep neural networks that fit 3Dmeshes on humans in monocular RGB input images. Estimating a 3D mesh from a 2D image is helpful in harvesting complete 3Dinformation about body pose and shape. However, learning such an estimation task in a supervised way is challenging owing to the fact that ground truth 3D mesh parameters for real humans do not exist. We propose a domain adaptation based single-shot (no re-projection, no iterative refinement), end-to-end training approach with joint optimization on real and synthetic images on a shared common task. Through joint inference on real and synthetic data, the network extracts domain invariant features that are further used to estimate the 3D mesh parameters in a single shot with no supervision on real samples. While we compute regression loss on synthetic samples with ground truth mesh parameters, knowledge is transferred from synthetic to real data through ADA without direct ground truth for supervision. Finally, we propose a partially supervised method for satellite image super-resolution by learning a unified representation of samples from different domains (captured by different sensors) in a shared latent space. The training samples are drawn from two datasets which we refer to as source and target domains. The source domain consists of fewer samples which are of higher resolution and contain very detailed and accurate annotations. In contrast, samples from the target domain are low-resolution and available ground truth is sparse. The pipeline consists of a feature extractor and a super-resolving module which are trained end-to-end. Using a deep feature extractor, we jointly learn (on two datasets) a common embedding space for all samples. Partial supervision is available for the samples in the source domain which have high-resolution ground truth. Adversarial supervision is used to successfully super-resolve low-resolution RGB satellite imagery from target domain without direct paired supervision from high resolution counterparts

    Sparse representation based hyperspectral image compression and classification

    Get PDF
    Abstract This thesis presents a research work on applying sparse representation to lossy hyperspectral image compression and hyperspectral image classification. The proposed lossy hyperspectral image compression framework introduces two types of dictionaries distinguished by the terms sparse representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively. The former is learnt in the spectral domain to exploit the spectral correlations, and the latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in hyperspectral images. To alleviate the computational demand of dictionary learning, either a base dictionary trained offline or an update of the base dictionary is employed in the compression framework. The proposed compression method is evaluated in terms of different objective metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of both SRSD and MSSD approaches. For the proposed hyperspectral image classification method, we utilize the sparse coefficients for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular, the discriminative character of the sparse coefficients is enhanced by incorporating contextual information using local mean filters. The classification performance is evaluated and compared to a number of similar or representative methods. The results show that our approach could outperform other approaches based on SVM or sparse representation. This thesis makes the following contributions. It provides a relatively thorough investigation of applying sparse representation to lossy hyperspectral image compression. Specifically, it reveals the effectiveness of sparse representation for the exploitation of spectral correlations in hyperspectral images. In addition, we have shown that the discriminative character of sparse coefficients can lead to superior performance in hyperspectral image classification.EM201

    Image restoration with group sparse representation and low‐rank group residual learning

    Get PDF
    Image restoration, as a fundamental research topic of image processing, is to reconstruct the original image from degraded signal using the prior knowledge of image. Group sparse representation (GSR) is powerful for image restoration; it however often leads to undesirable sparse solutions in practice. In order to improve the quality of image restoration based on GSR, the sparsity residual model expects the representation learned from degraded images to be as close as possible to the true representation. In this article, a group residual learning based on low-rank self-representation is proposed to automatically estimate the true group sparse representation. It makes full use of the relation among patches and explores the subgroup structures within the same group, which makes the sparse residual model have better interpretation furthermore, results in high-quality restored images. Extensive experimental results on two typical image restoration tasks (image denoising and deblocking) demonstrate that the proposed algorithm outperforms many other popular or state-of-the-art image restoration methods

    Image restoration with group sparse representation and low‐rank group residual learning

    Get PDF

    Learning to compress and search visual data in large-scale systems

    Full text link
    The problem of high-dimensional and large-scale representation of visual data is addressed from an unsupervised learning perspective. The emphasis is put on discrete representations, where the description length can be measured in bits and hence the model capacity can be controlled. The algorithmic infrastructure is developed based on the synthesis and analysis prior models whose rate-distortion properties, as well as capacity vs. sample complexity trade-offs are carefully optimized. These models are then extended to multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is further evolved as a powerful deep neural network architecture with fast and sample-efficient training and discrete representations. For the developed algorithms, three important applications are developed. First, the problem of large-scale similarity search in retrieval systems is addressed, where a double-stage solution is proposed leading to faster query times and shorter database storage. Second, the problem of learned image compression is targeted, where the proposed models can capture more redundancies from the training images than the conventional compression codecs. Finally, the proposed algorithms are used to solve ill-posed inverse problems. In particular, the problems of image denoising and compressive sensing are addressed with promising results.Comment: PhD thesis dissertatio

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications
    corecore