6 research outputs found

    Learning-based Wavelet-like Transforms For Fully Scalable and Accessible Image Compression

    Full text link
    The goal of this thesis is to improve the existing wavelet transform with the aid of machine learning techniques, so as to enhance coding efficiency of wavelet-based image compression frameworks, such as JPEG 2000. In this thesis, we first propose to augment the conventional base wavelet transform with two additional learned lifting steps -- a high-to-low step followed by a low-to-high step. The high-to-low step suppresses aliasing in the low-pass band by using the detail bands at the same resolution, while the low-to-high step aims to further remove redundancy from detail bands by using the corresponding low-pass band. These two additional steps reduce redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions. To train these two networks in an end-to-end fashion, we develop a backward annealing approach to overcome the non-differentiability of the quantization and cost functions during back-propagation. Importantly, the two additional networks share a common architecture, named a proposal-opacity topology, which is inspired and guided by a specific theoretical argument related to geometric flow. This particular network topology is compact and with limited non-linearities, allowing a fully scalable system; one pair of trained network parameters are applied for all levels of decomposition and for all bit-rates of interest. By employing the additional lifting networks within the JPEG2000 image coding standard, we can achieve up to 17.4% average BD bit-rate saving over a wide range of bit-rates, while retaining the quality and resolution scalability features of JPEG2000. Built upon the success of the high-to-low and low-to-high steps, we then study more broadly the extension of neural networks to all lifting steps that correspond to the base wavelet transform. The purpose of this comprehensive study is to understand what is the most effective way to develop learned wavelet-like transforms for highly scalable and accessible image compression. Specifically, we examine the impact of the number of learned lifting steps, the number of layers and the number of channels in each learned lifting network, and kernel support in each layer. To facilitate the study, we develop a generic training methodology that is simultaneously appropriate to all lifting structures considered. Experimental results ultimately suggest that to improve the existing wavelet transform, it is more profitable to augment a larger wavelet transform with more diverse high-to-low and low-to-high steps, rather than developing deep fully learned lifting structures

    Novi algoritam za kompresiju seizmičkih podataka velike amplitudske rezolucije

    Get PDF
    Renewable sources cannot meet energy demand of a growing global market. Therefore, it is expected that oil & gas will remain a substantial sources of energy in a coming years. To find a new oil & gas deposits that would satisfy growing global energy demands, significant efforts are constantly involved in finding ways to increase efficiency of a seismic surveys. It is commonly considered that, in an initial phase of exploration and production of a new fields, high-resolution and high-quality images of the subsurface are of the great importance. As one part in the seismic data processing chain, efficient managing and delivering of a large data sets, that are vastly produced by the industry during seismic surveys, becomes extremely important in order to facilitate further seismic data processing and interpretation. In this respect, efficiency to a large extent relies on the efficiency of the compression scheme, which is often required to enable faster transfer and access to data, as well as efficient data storage. Motivated by the superior performance of High Efficiency Video Coding (HEVC), and driven by the rapid growth in data volume produced by seismic surveys, this work explores a 32 bits per pixel (b/p) extension of the HEVC codec for compression of seismic data. It is proposed to reassemble seismic slices in a format that corresponds to video signal and benefit from the coding gain achieved by HEVC inter mode, besides the possible advantages of the (still image) HEVC intra mode. To this end, this work modifies almost all components of the original HEVC codec to cater for high bit-depth coding of seismic data: Lagrange multiplier used in optimization of the coding parameters has been adapted to the new data statistics, core transform and quantization have been reimplemented to handle the increased bit-depth range, and modified adaptive binary arithmetic coder has been employed for efficient entropy coding. In addition, optimized block selection, reduced intra prediction modes, and flexible motion estimation are tested to adapt to the structure of seismic data. Even though the new codec after implementation of the proposed modifications goes beyond the standardized HEVC, it still maintains a generic HEVC structure, and it is developed under the general HEVC framework. There is no similar work in the field of the seismic data compression that uses the HEVC as a base codec setting. Thus, a specific codec design has been tailored which, when compared to the JPEG-XR and commercial wavelet-based codec, significantly improves the peak-signal-tonoise- ratio (PSNR) vs. compression ratio performance for 32 b/p seismic data. Depending on a proposed configurations, PSNR gain goes from 3.39 dB up to 9.48 dB. Also, relying on the specific characteristics of seismic data, an optimized encoder is proposed in this work. It reduces encoding time by 67.17% for All-I configuration on trace image dataset, and 67.39% for All-I, 97.96% for P2-configuration and 98.64% for B-configuration on 3D wavefield dataset, with negligible coding performance losses. As a side contribution of this work, HEVC is analyzed within all of its functional units, so that the presented work itself can serve as a specific overview of methods incorporated into the standard

    Distributed Compressed Representation of Correlated Image Sets

    Get PDF
    Vision sensor networks and video cameras find widespread usage in several applications that rely on effective representation of scenes or analysis of 3D information. These systems usually acquire multiple images of the same 3D scene from different viewpoints or at different time instants. Therefore, these images are generally correlated through displacement of scene objects. Efficient compression techniques have to exploit this correlation in order to efficiently communicate the 3D scene information. Instead of joint encoding that requires communication between the cameras, in this thesis we concentrate on distributed representation, where the captured images are encoded independently, but decoded jointly to exploit the correlation between images. One of the most important and challenging tasks relies in estimation of the underlying correlation from the compressed correlated images for effective reconstruction or analysis in the joint decoder. This thesis focuses on developing efficient correlation estimation algorithms and joint representation of multiple correlated images captured by various sensing methodologies, e.g., planar, omnidirectional and compressive sensing (CS) sensors. The geometry of the 2D visual representation and the acquisition complexity vary for each sensor type. Therefore, we need to carefully consider the specific geometric nature of the captured images while developing distributed representation algorithms. In this thesis we propose robust algorithms in different scene analysis and reconstruction scenarios. We first concentrate on the distributed representation of omnidirectional images captured by catadioptric sensors. The omnidirectional images are captured from different viewpoints and encoded independently with a balanced rate distribution among the different cameras. They are mapped on the sphere which captures the plenoptic function in its radial form without Euclidean discrepancies. We propose a transform-based distributed coding algorithm, where the spherical images initially undergo a multi-resolution decomposition. The visual information is then split into two correlated partitions. The encoder transmits one partition after entropy coding, as well as the syndrome bits resulting from the Slepian-Wolf encoding of the other partition. The joint decoder estimates a disparity image to take benefit of the correlation between views and uses the syndrome bits to decode the missing information. Such a strategy proves to be beneficial with respect to the independent processing of images and shows only a small performance loss compared to the joint encoding of different views. The encoding complexity in the previous approach is non-negligible due to the visual information processing based on Slepian-Wolf coding and its associated rate parameter estimation. We therefore discard the Slepian-Wolf encoding and propose a distributed coding solution, where the correlated images are encoded independently using transform-based coding solutions (e.g., SPIHT). The central decoder now builds a correlation model from the compressed images, which is used to jointly decode a pair of images. Experimental results demonstrate that the proposed distributed coding solution improves the rate-distortion performance of the separate coding results for both planar and omnidirectional images. However, this improvement is significant only at medium to high bit rates. We therefore propose a rate allocation scheme that identifies and transmits the necessary visual information from each image to improve the correlation estimation accuracy at low bit rate. Experimental results show that for a given bit budget the proposed encoding scheme permits to compute an accurate correlation estimation comparing to the one obtained with SPIHT, JPEG 2000 or JPEG coding schemes. We show however that the improvement in the correlation estimation comes at the price of penalizing the image reconstruction quality; therefore there exists an interesting trade-off between the accurate correlation estimation and image reconstruction as encoding optimization objectives are different in both cases. Next, we further simplify the encoding complexity by replacing the classical imaging sensors with the simple CS sensors, that directly acquire the compressed images in the form of quantized linear measurements. We now concentrate on the particular problem, where one image is selected as the reference and it is used as a side information for the correlation estimation. We propose a geometry-based model to describe the correlation between the visual information in a pair of images. The joint decoder first captures the most prominent visual features in the reconstructed reference image using geometric functions. Since the images are correlated, these features are likely to be present in the other images too, possibly with geometric transformations. Hence, we propose to estimate the correlation model with a regularized optimization problem that locates these features in the compressed images. The regularization terms enforce smoothness of the transformation field, and consistency between the estimated images and the quantized measurements. Experimental results show that the proposed scheme is able to efficiently estimate the correlation between images for several multi-view and video datasets. The proposed scheme is finally shown to outperform DSC schemes based on unsupervised disparity (or motion) learning, as well as independent coding solutions based on JPEG 2000. We then extend the previous scenario to a symmetric decoding problem, where we are interested to estimate the correlation model directly from the quantized linear measurements without explicitly reconstructing the reference images. We first show that the motion field that represents the main source of correlation between images can be described as a linear operator. We further derive a linear relationship between the correlated measurements in the compressed domain. We then derive a regularized cost function to estimate the correlation model directly in the compressed domain using graph-based optimization algorithms. Experimental results show that the proposed scheme estimates an accurate correlation model among images in both multi-view and video imaging scenarios. We then propose a robust data fidelity term that improves the quality of the correlation estimation when the measurements are quantized. Finally, we show by experiments that the proposed compressed correlation estimation scheme is able to compete the solution of a scheme that estimates a correlation model from the reconstructed images without the complexity of image reconstruction. Finally, we study the benefit of using the correlation information while jointly reconstructing the images from the compressed linear measurements. We consider both the asymmetric and symmetric scenarios described previously. We propose joint reconstruction methodologies based on a constrained optimization problem which is solved using effective proximal splitting methods. The constraints included in our framework enforce the reconstructed images to satisfy both the correlation and the quantized measurements consistency objectives. Experimental results demonstrate that the proposed joint reconstruction scheme improves the quality of the decoded images, when compared to a scheme where the images are handled independently. In this thesis we build efficient distributed scene representation algorithms for the multiple correlated images captured in planar, omnidirectional and CS cameras. The coding rate in our symmetric distributed coding solution stays balanced between the encoders and stays close to the joint encoding solutions. Our novel algorithms lead to effective correlation estimation in different sensing and coding scenarios. In addition, we provide innovative solutions for robust correlation estimation from highly compressed images in simple sensing frameworks. Our CS-based joint reconstruction frameworks effectively exploit the inter-view correlation, that permits to achieve high compression gains compared to state-of-the-art independent and distributed coding solutions

    Multiterminal Video Coding: From Theory to Application

    Get PDF
    Multiterminal (MT) video coding is a practical application of the MT source coding theory. For MT source coding theory, two problems associated with achievable rate regions are well investigated into in this thesis: a new sufficient condition for BT sum-rate tightness, and the sum-rate loss for quadratic Gaussian MT source coding. Practical code design for ideal Gaussian sources with quadratic distortion measure is also achieved for cases more than two sources with minor rate loss compared to theoretical limits. However, when the theory is applied to practical applications, the performance of MT video coding has been unsatisfactory due to the difficulty to explore the correlation between different camera views. In this dissertation, we present an MT video coding scheme under the H.264/AVC framework. In this scheme, depth camera information can be optionally sent to the decoder separately as another source sequence. With the help of depth information at the decoder end, inter-view correlation can be largely improved and thus so is the compression performance. With the depth information, joint estimation from decoded frames and side information at the decoder also becomes available to improve the quality of reconstructed video frames. Experimental result shows that compared to separate encoding, up to 9.53% of the bit rate can be saved by the proposed MT scheme using decoder depth information, while up to 5.65% can be saved by the scheme without depth camera information. Comparisons to joint video coding schemes are also provided

    Sparse representation based hyperspectral image compression and classification

    Get PDF
    Abstract This thesis presents a research work on applying sparse representation to lossy hyperspectral image compression and hyperspectral image classification. The proposed lossy hyperspectral image compression framework introduces two types of dictionaries distinguished by the terms sparse representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively. The former is learnt in the spectral domain to exploit the spectral correlations, and the latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in hyperspectral images. To alleviate the computational demand of dictionary learning, either a base dictionary trained offline or an update of the base dictionary is employed in the compression framework. The proposed compression method is evaluated in terms of different objective metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of both SRSD and MSSD approaches. For the proposed hyperspectral image classification method, we utilize the sparse coefficients for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular, the discriminative character of the sparse coefficients is enhanced by incorporating contextual information using local mean filters. The classification performance is evaluated and compared to a number of similar or representative methods. The results show that our approach could outperform other approaches based on SVM or sparse representation. This thesis makes the following contributions. It provides a relatively thorough investigation of applying sparse representation to lossy hyperspectral image compression. Specifically, it reveals the effectiveness of sparse representation for the exploitation of spectral correlations in hyperspectral images. In addition, we have shown that the discriminative character of sparse coefficients can lead to superior performance in hyperspectral image classification.EM201
    corecore