239 research outputs found

    Wavelet compression of digital holograms: Towards a view-dependent framework

    Get PDF
    International audienceAn analysis and discussion on the relevance of various wavelet schemes for hologram compression and reconstruction when the rendering configuration makes it possible to exploit selective refinements to perform a viewpoint-based degraded reconstruction. It is observed that Gabor wavelet bases have better time-frequency localization as compared to Fresnelet bases and hence are well suited for view-dependent compression techniques for hologram reconstruction

    A content based method for perceptually driven joint color/depth compression

    Get PDF
    International audienceMulti-view Video plus Depth (MVD) data refer to a set of conventional color video sequences and an associated set of depth video sequences, all acquired at slightly different viewpoints. This huge amount of data necessitates a reliable compression method. However, there is no standardized compression method for MVD sequences. H.264/MVC compression method, which was standardized for Multi-View-Video representation (MVV), has been the subject of many adaptations to MVD. However, it has been shown that MVC is not well adapted to encode multi-view depth data. We propose a novel option as for compression of MVD data. Its main purpose is to preserve joint color/depth consistency. The originality of the proposed method relies on the use of the decoded color data as a prior for the associated depth compression. This is meant to ensure consistency in both types of data after decoding. Our strategy is motivated by previous studies of artifacts occurring in synthesized views: most annoying distortions are located around strong depth discontinuities and these distortions are due to misalignment of depth and color edges in decoded images. Thus the method is meant to preserve edges and to ensure consistent localization of color edges and depth edges. To ensure compatibility, colored sequences are encoded with H.264. Depth maps compression is based on a 2D still image codec, namely LAR (Locally adapted Resolution). It consists in a quad-tree representation of the images. The quad-tree representation contributes in the preservation of edges in both color and depth data. The adopted strategy is meant to be more perceptually driven than state-of-the-art methods. The proposed approach is compared to H.264 encoding of depth images. Objective metrics scores are similar with H.264 and with the proposed method, and visual quality of synthesized views is improved with the proposed approach

    Morlet Wavelet transformed holograms for numerical adaptive view-based reconstruction

    Get PDF
    International audienceWe provide an efficient method of using Morlet wavelets for transforming a hologram and reconstructing parts of a scene based on the position of viewer by using a sparse set of Morlet transformed coefficients. We provide a design of a Morlet wavelet and explain an efficient discretization method for the application of view-dependent representation systems. Results are provided based on the numerical reconstruction, and it is shown that view- dependent representation along with Morlet wavelets form a good starting step for compressing holographic data for next generation 3DTV applications

    Incremental-LDI for Multi-View Coding

    Get PDF
    International audienceThis paper describes an Incremental algorithm for Layer Depth Image construction (I-LDI) from multi-view plus depth data sets. A solution to sampling artifacts is proposed, based on pixel interpolation (inpainting) restricted to isolated unknown pixels. A solution to ghosting artifacts is also proposed, based on a depth discontinuity detection, followed by a local foreground / background classification. We propose a formulation of warping equations which reduces time consumption, specifically for LDI warping. Tests on Breakdancers and Ballet MVD data sets show that extra layers in I-LDI contain only 10% of first layer pixels, compared to 50% for LDI. I-LDI Layers are also more compact, with a less spread pixel distribution, and thus easier to compress than LDI Visual rendering is of similar quality with I-LDI and LDI

    Focus on visual rendering quality through content-based depth map coding

    Get PDF
    International audienceMulti-view video plus depth (MVD) data is a set of multiple sequences capturing the same scene at different viewpoints, with their associated per-pixel depth value. Overcoming this large amount of data requires an effective coding framework. Yet, a simple but essential question refers to the means assessing the proposed coding methods. While the challenge in compression is the optimization of the rate-distortion ratio, a widely used objective metric to evaluate the distortion is the Peak-Signal-to-Noise-Ratio (PSNR), because of its simplicity and mathematically easiness to deal with such purposes. This paper points out the problem of reliability, concerning this metric, when estimating 3D video codec performances. We investigated the visual performances of two methods, namely H.264/MVC and Locally Adaptive Resolution (LAR) method, by encoding depth maps and reconstructing existing views from those degraded depth images. The experiments revealed that lower coding efficiency, in terms of PSNR, does not imply a lower rendering visual quality and that LAR method preserves the depth map properties correctly

    ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression

    Full text link
    Over the last few years, neural image compression has gained wide attention from research and industry, yielding promising end-to-end deep neural codecs outperforming their conventional counterparts in rate-distortion performance. Despite significant advancement, current methods, including attention-based transform coding, still need to be improved in reducing the coding rate while preserving the reconstruction fidelity, especially in non-homogeneous textured image areas. Those models also require more parameters and a higher decoding time. To tackle the above challenges, we propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive prior to capturing both global and local contexts from the hyper and quantized latent representations. The proposed architecture can be optimized end-to-end to fully exploit the context information and extract compact latent representation while reconstructing higher-quality images. Experimental results on four widely-used datasets showed that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM, respectively. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the next generation ConvNet, namely ConvNeXt, and Swin Transformer.Comment: arXiv admin note: substantial text overlap with arXiv:2307.02273. text overlap with arXiv:2307.0609

    Representation and coding of 3D video data

    Get PDF
    Livrable D4.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.1 du projet

    Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression

    Full text link
    Recently, the performance of neural image compression (NIC) has steadily improved thanks to the last line of study, reaching or outperforming state-of-the-art conventional codecs. Despite significant progress, current NIC methods still rely on ConvNet-based entropy coding, limited in modeling long-range dependencies due to their local connectivity and the increasing number of architectural biases and priors, resulting in complex underperforming models with high decoding latency. Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Through the proposed ICT, we can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre-/post-processor to accurately extract more compact latent codes while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the adaptive image compression transformer (AICT) and the neural codec SwinT-ChARM

    NERV++: An Enhanced Implicit Neural Video Representation

    Full text link
    Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research

    Objective View Synthesis Quality Assessment

    Get PDF
    International audienceView synthesis brings geometric distortions which are not handled efficiently by existing image quality assessment metrics. Despite the widespread of 3-D technology and notably 3D television (3DTV) and free-viewpoints television (FTV), the field of view synthesis quality assessment has not yet been widely investigated and new quality metrics are required. In this study, we propose a new full-reference objective quality assessment metric: the View Synthesis Quality Assessment (VSQA) metric. Our method is dedicated to artifacts detection in synthesized view-points and aims to handle areas where disparity estimation may fail: thin objects, object borders, transparency, variations of illumination or color differences between left and right views, periodic objects... The key feature of the proposed method is the use of three visibility maps which characterize complexity in terms of textures, diversity of gradient orientations and presence of high contrast. Moreover, the VSQA metric can be defined as an extension of any existing 2D image quality assessment metric. Experimental tests have shown the effectiveness of the proposed method
    • …
    corecore